AI-Ready Data Engineering: Pipelines for RAG & Agents – CG群

RAG与智能体AI数据管道实战

本高阶课程教你从零构建企业级AI数据平台，涵盖增量向量ETL、多模态解析及行级权限控制，打通模型研发到生产落地的最后一公里。

AI-Ready Data Engineering: Pipelines for RAG & Agents

该高阶实战课程旨在填补模型研发与生产落地的鸿沟，致力于帮助开发者从零构建基于RAG、智能体及受控Text-to-SQL的企业级AI数据平台。课程涵盖增量向量ETL、多模态解析、行级权限控制及高安全、可评估的AI基础设施运营。

Build the data systems behind RAG, agents, and governed text-to-SQL — retrieval, evaluation, MCP, cost, and governance.

Build the data systems behind RAG, agents, and governed text-to-SQL — retrieval, evaluation, MCP, cost, and governance.

What you’ll learn
⚡ Build incremental, idempotent embedding ETL pipelines that stay fresh and survive model swaps
⚡ Demystify embeddings and choose chunking strategies that actually preserve meaning
⚡ Parse the messy real world — PDFs, scanned tables, semi-structured docs — into clean, chunked, retrievable data
⚡ Enforce ingestion freshness and data contracts so your AI never silently serves stale or broken data
⚡ Design vector and hybrid retrieval with reranking, metadata filtering, and row-level security
⚡ Ship governed text-to-SQL with semantic layers on Snowflake Cortex Analyst and Databricks Genie
⚡ Prove text-to-SQL accuracy with an evaluation harness instead of hoping it’s right
⚡ Understand the RAG-to-agents mental model and when each is the right tool
⚡ Feed agents safely with context engineering, the Model Context Protocol (MCP), and durable memory
⚡ Build multi-source agentic retrieval across warehouses, vector stores, and APIs
⚡ Build an evaluation + observability + tracing harness that gates every change before it ships
⚡ Engineer cost, guardrails, governance, lineage, and access-aware retrieval for AI data in production

Requirements
❗ Comfortable writing SQL and Python — this is an advanced, build-heavy course
❗ Familiarity with at least one warehouse (Snowflake or Databricks) and basic data-pipeline concepts (ETL/ELT)
❗ An API key for an LLM provider (for the RAG/agent/text-to-SQL labs) — a small amount of usage spend is expected
❗ A laptop with Python 3.11+, Git, and a code editor
❗ Optional but useful: a free Snowflake trial (Cortex Analyst) and/or Databricks trial (Genie, Vector Search) for the production modules
❗ No prior RAG/agent experience required — we start from embeddings and build to a full AI data platform

Description

Everyone is building AI features. Almost no one is building the data systems that make them actually work. That gap is the data engineer’s to own — and this course teaches you exactly how.
Across 25 modules and 138 lessons you build the pipelines behind real AI: retrieval, RAG, agents, and governed text-to-SQL — not toy demos, but the incremental, evaluated, secured, cost-controlled systems that survive production. You follow one engineer,Maya, whose mandate is “make our data AI-ready,” and you finish by shipping the capstone —AskTheData, an end-to-end AI data platform — yourself.

What makes this course different
✨Data-engineering-first, not prompt-first. Embeddings, chunking, ingestion freshness, data contracts, vector stores, hybrid search — the substrate AI runs on, built properly.

✨Code-first with a lab every module. Every module ships runnable code and a hands-on lab; over 40% of slides are real code or demos.

✨Production war stories. Every hard idea lands with a story and an analogy, and real failure drills show you exactly how these systems break — and how to stop them.

✨Governed text-to-SQL done right. Semantic layers on Snowflake Cortex Analyst and Databricks Genie, with an evaluation harness that proves accuracy instead of hoping for it.

✨Agents fed safely. Context engineering, the Model Context Protocol (MCP), durable memory, and access-aware retrieval — the parts most “agent” courses skip.

✨Evaluate, observe, cost-control. An eval + observability harness that gates every change, plus real cost benchmarks — the discipline that separates a demo from a platform.

What you’ll build, module by module: embeddings demystified, ingestion + freshness + data contracts for AI, document parsing of the messy real world, chunking strategies that actually work, the embedding ETL pipeline (incremental, idempotent, model-swap-safe), vector stores with row-level security, hybrid search + reranking, text-to-SQL and its traps, semantic layers, Cortex Analyst and Databricks Genie in production, text-to-SQL evaluation + hardening, the RAG-to-agents mental model, context engineering for data agents, tools + MCP, agent memory and state, multi-source agentic retrieval, evaluation harnesses, observability + tracing, guardrails + hallucination control, cost engineering with real benchmarks, governance + lineage + access-aware retrieval, orchestration + testing + productionizing — and the AskTheData capstone that ties it all together.

The capstone — AskTheData: you architect and build an end-to-end AI data platform: ingestion → embedding ETL → governed retrieval → agentic text-to-SQL → evaluation → observability — then run a production failure drill and self-score against a rubric. It’s the portfolio piece that proves you can build AI data systems, not just call an API.

Who this is for
✨ Data engineers who want to own the AI layer instead of handing it to someone else

✨ Analytics engineers and warehouse practitioners (Snowflake / Databricks) moving into RAG, agents, and text-to-SQL

✨ ML / platform engineers who need the data substrate behind retrieval and agents done right

✨ Backend engineers building AI features who keep hitting data-quality, retrieval, and cost walls

✨ Anyone who can write SQL and Python and wants to build evaluated, governed, production AI data systems

By the end of this course, you will be able to architect, evaluate, secure, and cost-control an end-to-end AI data platform — retrieval, RAG, agents, and governed text-to-SQL — and prove it works before it ships.
Enrol now. The AI features are easy. The data systems behind them are the moat — and they’re yours to build.

Who this course is for
⭐ Data engineers who want to own the AI layer
⭐ Analytics / warehouse engineers (Snowflake, Databricks) moving into RAG, agents, and text-to-SQL
⭐ ML and platform engineers who need the data substrate behind AI done right
⭐ Backend engineers building AI features hitting data, retrieval, and cost walls
⭐ Engineers who can write SQL and Python and want evaluated, governed, production AI data systems

隐藏内容

此处内容需要权限查看