从零手写RAG系统:Python实战课

拒绝框架魔法,纯手写Python与OpenAI构建生产级RAG系统。涵盖流式Web聊天、混合检索、多轮记忆与图片支持,深入掌握大模型应用底层工程细节。

Building a RAG application in Python

《Build a production-ready RAG system with Python & OpenAI》 是 Udemy 平台上专为中高级 Python 开发者打造的大模型原生应用(AI-Native)硬核实战课程

这门课程的核心宗旨是“拒绝框架魔法,从零手写一切(No LangChain, No LlamaIndex, No Magic)”。它带领学员从一个完全空白的目录开始,纯手写每一行 Python 后端、SQL 数据库逻辑与前端网页代码,旨在帮助开发者彻底搞懂大模型应用的底层工程细节,构建出一套真正具备生产环境交付标准(Production-Ready)的多模态流式 RAG(检索增强生成)系统。

Published 6/2026
MP4 | Video: h264, 1920×1080 | Audio: AAC, 44.1 KHz, 2 Ch
Language: English | Duration: 9h 50m | Size: 7 GB

Build a streaming web chat with hybrid retrieval, multi-turn memory, and image support — from scratch

What you’ll learn
Build a complete Retrieval-Augmented Generation pipeline in Python, from document ingestion to streaming chat output
Run Postgres with the pgvector extension via Docker Compose, including HNSW indexing for fast approximate-nearest-neighbour vector search
Chunk documents with paragraph-aware splitting and overlap, and explain why each chunking choice affects retrieval quality
Implement idempotent, atomic document ingestion using SHA-256 content hashes and transactional upserts
Use the OpenAI SDK to call local Ollama models and OpenAI’s hosted API through the same code path
Implement hybrid retrieval that combines dense vector search with Postgres full-text BM25, fused with Reciprocal Rank Fusion
Build a query rewriter that turns follow-up questions like “what does it eat?” into standalone search queries that actually retrieve useful chunks
Build a directory watcher with watchdog, including per-path debouncing so editor saves never trigger reads of half-written files
Apply the Strategy/Adapter pattern to swap a Postgres backend for Weaviate via a single environment variable, with zero changes to the rest of the code
Build a streaming chat web UI with FastAPI, Server-Sent Events, and vanilla JavaScript — no React, no build step
Ingest images using a “describe-then-embed” vision-model pipeline, including format normalization for vision backends
Render LLM markdown output safely in the browser with marked + DOMPurify, including inline images
Apply standard software-engineering patterns — Dependency Injection, Factory, Strategy/Adapter, context managers, lazy imports, etc.
Diagnose RAG failures empirically (cosine scores, full-text ranks, fused output) instead of guessing at prompts

Requirements
Basic Python skills, basic SQL, comfort with the command line and Docker. No prior LLM or vector-database experience needed.

Description
Build a workingRetrieval-Augmented Generation (RAG) application in Python — from an empty directory to a streaming web chat with multi-turn memory, hybrid retrieval, image ingestion, and two interchangeable vector-store backends. No LangChain, no LlamaIndex, no magic. You write every line yourself, and by the end you understand exactly what each one does.

Most RAG tutorials wrap everything in a single high-level library and stop at “it works.” This course goes the other way. You’ll build the pipeline from scratch — chunking, embeddings, idempotent ingestion, hybrid semantic-plus-lexical retrieval with Reciprocal Rank Fusion, a query rewriter for follow-up questions, server-sent token streaming, a vision-model branch for images — on top of plain Postgres (with pgvector) and a local Ollama server.No API bills while you learn. No black boxes. When you later reach for a framework like LangChain, you’ll actually understand what it’s doing under the hood.

What you’ll build, in one project
– Runs entirely locally against Ollama, or transparently against the OpenAI API by changing one environment variable

– Stores embeddings in Postgres + pgvector with HNSW indexing, or in Weaviate — backends swappable via a single config setting

– Hybrid retrieval: dense vector search and Postgres full-text BM25, fused with Reciprocal Rank Fusion — fixing the cases where pure semantic search silently fails on rare terms, names, and identifiers

– A directory watcher that ingests new files automatically, with editor-save debouncing so it never reads a half-written file

– A streaming web chat UI built on FastAPI + Server-Sent Events + vanilla JavaScript — no React, no build step — with multi-turn memory, query rewriting for follow-ups, source citations, and inline image rendering

– Image ingestion through a vision model with a “describe-then-embed” pipeline — multimodal in the same chunks table, no schema change required

Along the way you’ll work through real software-design patterns in real code: Dependency Injection, Strategy/Adapter, Factory, lifespans, context managers, thread-safety boundaries, atomic transactions, defensive coding against external services that quietly don’t work the way their docs claim. The course’s recurring theme is the payoff of good abstractions: the vector-store interface designed early lets you bolt on a second backend in one file; the same retrieval pipeline serves both the CLI and the web app; the chunk-metadata field that seemed academic early in the course is what makes image support a simple change later on.

You’ll finish with a codebase you can extend — add a reranker, try a different embedder, swap the chat model, point it at a corpus of your own docs — and the engineering vocabulary to talk about RAG as production software, not a notebook demo.

Who this course is for
Python developers interested in integrating LLMs into their projects, and adding RAG functionality.

隐藏内容

此处内容需要权限查看

  • 普通3金币
  • 会员免费
  • 永久会员免费推荐
会员免费查看

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注