Build a powerful & robust automation pipeline using modern data extraction practices.
Build a powerful Python-based automation pipeline in this course. Level up your data wrangling skills.
Automatically find and track topics you care about across Reddit posts. From camping to the latest in AI news, this course will show you how to build a powerful and resilient system in Python.
The goal is of this course is to help you develop the skills you need to build a resilient data extraction platform using only a handful of tools and the latest in LLMs from Google. In addition to the new skills you'll learn, you'll also have rich data to help you better learn from what real people are experiencing all around the world.
Topics
✅ Easily download the latest Reddit conversations around topics you care about
✅ Ai-Powered Google search to extract relevant Reddit Communities (aka SERP)
✅ Build & ingest data through public webhooks (notifications that work software-to-software or app-to-app)
✅ Rapid prototype data scraping/extracting with Python & Jupyter Notebooks
✅ Use Gemini to run your Python functions based on plain english (aka Tool Calling)
✅ Store extracted data through the Django ORM and PostgreSQL
✅ Strict & structured data outputs for LLMs with Pydantic
✅ Fault-tolerant data downloads using background tasks & webhooks
✅ Configure serverless and serverfull worker managers (django-qstash & celery)
✅ and much more
Stack
‣ Python
‣ Jupyter (rapid prototyping)
‣ Django (web app & automation coordinator)
‣ Postgres (database)
‣ Redis (caching & queues)
‣ Celery (background tasks)
‣ Django QStash (serverless background tasks)
‣ Bright Data Search Engine AI (SERP)
‣ Bright Data Crawl API (extract Reddit posts)
‣ LangChain (integration to Google Gemini LLM)
‣ LangGraph (easily unlock Tool Calling)
‣ Cloudflare Tunnels (public domain to your project to accept webhooks)
Resources
Ready to begin?