[{"data":1,"prerenderedAt":712},["ShallowReactive",2],{"blog-agentic-rag-and-production":3},{"id":4,"title":5,"body":6,"category":697,"date":698,"description":16,"extension":699,"meta":700,"navigation":701,"path":702,"seo":703,"stem":704,"tags":705,"__hash__":711},"blog\u002Fblog\u002Fagentic-rag-and-production.md","从经典 RAG 到 Agentic RAG：智能检索的进阶之路",{"type":7,"value":8,"toc":678},"minimark",[9,13,17,20,25,28,51,54,60,64,67,70,81,84,95,100,104,107,113,125,131,139,142,147,151,154,161,164,169,173,176,179,182,187,191,194,385,388,405,408,412,415,418,421,436,439,444,448,451,454,480,483,486,538,541,544,570,573,576,590,593,597,600,611,614,640,645,648,651,657,664,667,674],[10,11,5],"h1",{"id":12},"从经典-rag-到-agentic-rag智能检索的进阶之路",[14,15,16],"p",{},"如果你把经典 RAG 理解成一条流水线——用户提问、系统检索、LLM 回答，每一步都是预设好的固定流程。那么 Agentic RAG 就像是把这条流水线升级成了一个智能工厂——LLM 不再被动等待输入，而是主动决定要不要检索、查什么、查几次、甚至怀疑检索结果的准确性。",[14,18,19],{},"这篇文章从经典 RAG 的局限出发，梳理各种进阶 RAG 模式，最后落地到生产级工程实践。",[21,22,24],"h2",{"id":23},"_1-经典-rag-的局限","1. 经典 RAG 的局限",[14,26,27],{},"经典 RAG 的流程是一条直线：问题 → 检索 → 拼 Prompt → 生成。这个流程简单可靠，但也有三个明显的短板：",[29,30,31,39,45],"ul",{},[32,33,34,38],"li",{},[35,36,37],"strong",{},"检索一次性","：如果第一次检索没找到合适信息，不会重试",[32,40,41,44],{},[35,42,43],{},"Query 固定","：用户怎么问就怎么查，不会改写或拆解",[32,46,47,50],{},[35,48,49],{},"无反馈闭环","：LLM 生成时如果发现信息不足，没法回头再查",[14,52,53],{},"这些局限在简单问答场景下不是问题，但面对复杂、多跳、需要推理的问题时，经典 RAG 就显得力不从心了。",[14,55,56,59],{},[35,57,58],{},"小结："," 经典 RAG 胜在简单可控，但在复杂问题上精度不够。进阶 RAG 模式的目标就是补上这些短板。",[21,61,63],{"id":62},"_2-parent-child-small-to-big","2. Parent-Child \u002F Small-to-Big",[14,65,66],{},"前面讲到过，小 chunk 检索更精准但信息不完整，大 chunk 信息完整但精度不够。Parent-Child Chunking 巧妙地解决了这个矛盾。",[14,68,69],{},"思路很简单：建两级 chunk。小 chunk（child）存索引用于检索，命中后返回对应的父 chunk（parent）喂给 LLM。",[71,72,77],"pre",{"className":73,"code":75,"language":76},[74],"language-text","索引层：小 chunk（300 token）→ 精准匹配\n返回层：父 chunk（1500 token）→ 完整上下文\n","text",[78,79,75],"code",{"__ignoreMap":80},"",[14,82,83],{},"这样既保证了检索的召回精度，又保证了 LLM 拿到的是完整信息。",[14,85,86,87,90,91,94],{},"类似的思想还有 ",[35,88,89],{},"Sentence Window Retrieval","：检索到某个句子后，带上前后的句子作为上下文一起返回。以及 ",[35,92,93],{},"Auto-Merging Retrieval","：多个相邻小 chunk 都被命中时，自动合并成更大的父块。",[14,96,97,99],{},[35,98,58],{}," 这些模式的核心思想是一致的——检索粒度细、返回粒度粗。在准确性和信息完整性之间找到一个巧妙的平衡点。",[21,101,103],{"id":102},"_3-self-rag-与-crag","3. Self-RAG 与 CRAG",[14,105,106],{},"这两种模式给 RAG 加入了\"元认知\"——系统不再盲目执行检索流程，而是学会反思自己的行为。",[14,108,109,112],{},[35,110,111],{},"Self-RAG"," 让 LLM 在生成过程中自己决定三件事：",[114,115,116,119,122],"ol",{},[32,117,118],{},"需不需要检索（有些问题 LLM 自己能回答，无需检索）",[32,120,121],{},"检索回来的内容是否相关（过滤不相关的结果）",[32,123,124],{},"回答是否基于检索内容（自我验证，防止幻觉）",[14,126,127,130],{},[35,128,129],{},"CRAG（Corrective RAG）"," 则在检索之后加了一个专门的评估器，判断检索结果的质量：",[29,132,133,136],{},[32,134,135],{},"质量好 → 用检索结果回答",[32,137,138],{},"质量差 → 触发 Fallback——比如改用网页搜索、或者直接让 LLM 用自己的知识回答——并记录下来用于后续优化",[14,140,141],{},"两种模式的区别在于：Self-RAG 的\"反思\"内化在 LLM 的生成过程中，CRAG 的\"纠错\"是外挂的一个独立模块。",[14,143,144,146],{},[35,145,58],{}," 经典 RAG 是\"查了就用\"的直线思维，Self-RAG 和 CRAG 引入了\"查了再想想对不对\"的反思机制。这是从被动检索到主动推理的关键一步。",[21,148,150],{"id":149},"_4-graph-rag","4. Graph RAG",[14,152,153],{},"Graph RAG 用知识图谱代替或增强向量检索。节点是实体和概念，边是它们之间的关系，检索时沿着图谱做多跳推理。",[14,155,156,157,160],{},"微软的 GraphRAG 论文是这一方向的代表作。它的核心优势在于处理",[35,158,159],{},"需要多跳推理的复杂问题","，比如\"A 和 B 的共同朋友中谁是 C 公司的员工？\"这类问题如果用纯向量检索，需要把整段关系描述都在一个 chunk 里，而这往往不现实。",[14,162,163],{},"Graph RAG 的代价也很明显：构建和维护知识图谱的成本远高于简单的向量化。它适用于知识之间关系密集、多跳查询频繁的场景，但不是所有 RAG 系统都需要。",[14,165,166,168],{},[35,167,58],{}," Graph RAG 适合关系密集、多跳推理的场景。如果你的问题大多是\"某某文档里提到什么\"，纯向量检索就够了。但如果是\"某某和某某之间有什么关系\"，Graph RAG 就有不可替代的优势。",[21,170,172],{"id":171},"_5-contextual-retrieval","5. Contextual Retrieval",[14,174,175],{},"2024 年 Anthropic 提出的方法，思路非常实用：给每个 chunk 生成一段上下文描述，说明它在原文档中的位置和主题，然后对\"chunk + 上下文描述\"做 Embedding。",[14,177,178],{},"为什么有效？因为很多 chunk 单独拎出来是缺乏语境的。比如某 chunck 里写着\"它的延迟是 200ms\"——这个\"它\"指的是什么？如果 chunk 没有包含上文，检索时很可能匹配不到。加上上下文描述后，\"TransactionService 的延迟是 200ms\"就清晰多了。",[14,180,181],{},"Anthropic 报告说这种方法可以把检索错误率降低 35-49%。这是一个成本很低但效果显著的优化。",[14,183,184,186],{},[35,185,58],{}," 有时候问题不在检索算法，而在于被检索的内容本身缺乏语境。Contextual Retrieval 用最小的工程成本解决了这个问题。",[21,188,190],{"id":189},"_6-agentic-rag检索作为工具","6. Agentic RAG：检索作为工具",[14,192,193],{},"如果把检索封装成一个工具函数交给 LLM 调用，RAG 就不再是一条流水线，而是一场对话。",[71,195,199],{"className":196,"code":197,"language":198,"meta":80,"style":80},"language-typescript shiki shiki-themes github-dark","const retrieveTool = tool(\n  async ({ query }) => {\n    const docs = await vectorStore.similaritySearch(query, 5);\n    return docs.map(d => d.pageContent).join(\"\\n---\\n\");\n  },\n  {\n    name: \"retrieve_knowledge\",\n    description: \"从知识库检索相关文档\",\n    schema: z.object({ query: z.string() }),\n  }\n);\n","typescript",[78,200,201,225,247,276,321,327,333,345,356,374,380],{"__ignoreMap":80},[202,203,206,210,214,217,221],"span",{"class":204,"line":205},"line",1,[202,207,209],{"class":208},"snl16","const",[202,211,213],{"class":212},"sDLfK"," retrieveTool",[202,215,216],{"class":208}," =",[202,218,220],{"class":219},"svObZ"," tool",[202,222,224],{"class":223},"s95oV","(\n",[202,226,228,231,234,238,241,244],{"class":204,"line":227},2,[202,229,230],{"class":208},"  async",[202,232,233],{"class":223}," ({ ",[202,235,237],{"class":236},"s9osk","query",[202,239,240],{"class":223}," }) ",[202,242,243],{"class":208},"=>",[202,245,246],{"class":223}," {\n",[202,248,250,253,256,258,261,264,267,270,273],{"class":204,"line":249},3,[202,251,252],{"class":208},"    const",[202,254,255],{"class":212}," docs",[202,257,216],{"class":208},[202,259,260],{"class":208}," await",[202,262,263],{"class":223}," vectorStore.",[202,265,266],{"class":219},"similaritySearch",[202,268,269],{"class":223},"(query, ",[202,271,272],{"class":212},"5",[202,274,275],{"class":223},");\n",[202,277,279,282,285,288,291,294,297,300,303,305,309,312,315,317,319],{"class":204,"line":278},4,[202,280,281],{"class":208},"    return",[202,283,284],{"class":223}," docs.",[202,286,287],{"class":219},"map",[202,289,290],{"class":223},"(",[202,292,293],{"class":236},"d",[202,295,296],{"class":208}," =>",[202,298,299],{"class":223}," d.pageContent).",[202,301,302],{"class":219},"join",[202,304,290],{"class":223},[202,306,308],{"class":307},"sU2Wk","\"",[202,310,311],{"class":212},"\\n",[202,313,314],{"class":307},"---",[202,316,311],{"class":212},[202,318,308],{"class":307},[202,320,275],{"class":223},[202,322,324],{"class":204,"line":323},5,[202,325,326],{"class":223},"  },\n",[202,328,330],{"class":204,"line":329},6,[202,331,332],{"class":223},"  {\n",[202,334,336,339,342],{"class":204,"line":335},7,[202,337,338],{"class":223},"    name: ",[202,340,341],{"class":307},"\"retrieve_knowledge\"",[202,343,344],{"class":223},",\n",[202,346,348,351,354],{"class":204,"line":347},8,[202,349,350],{"class":223},"    description: ",[202,352,353],{"class":307},"\"从知识库检索相关文档\"",[202,355,344],{"class":223},[202,357,359,362,365,368,371],{"class":204,"line":358},9,[202,360,361],{"class":223},"    schema: z.",[202,363,364],{"class":219},"object",[202,366,367],{"class":223},"({ query: z.",[202,369,370],{"class":219},"string",[202,372,373],{"class":223},"() }),\n",[202,375,377],{"class":204,"line":376},10,[202,378,379],{"class":223},"  }\n",[202,381,383],{"class":204,"line":382},11,[202,384,275],{"class":223},[14,386,387],{},"LLM 收到问题后自己判断：",[29,389,390,393,396,399,402],{},[32,391,392],{},"需要检索吗？（有些问题不需要）",[32,394,395],{},"用什么 Query 检索？（自主改写）",[32,397,398],{},"检索结果够用吗？（评估）",[32,400,401],{},"不够就再查一次（换一个 Query）",[32,403,404],{},"查询轮信息足够后，生成最终答案",[14,406,407],{},"这个模式下，LLM 从一个\"答案生成器\"变成了\"问题解决者\"——它掌控整个检索和推理的流程，而不是被动执行预先编排的步骤。",[409,410,411],"h3",{"id":411},"多跳检索",[14,413,414],{},"复杂问题往往一次检索不够。比如：",[14,416,417],{},"用户问：\"LangGraph 的 Checkpointer 和 LangChain 的 Memory 有什么区别？\"",[14,419,420],{},"Agent 的执行过程可能是：",[114,422,423,428,433],{},[32,424,425],{},[78,426,427],{},"retrieve(\"LangGraph checkpointer\")",[32,429,430],{},[78,431,432],{},"retrieve(\"LangChain memory\")",[32,434,435],{},"对比两者的文档，生成答案",[14,437,438],{},"每一次检索的结果都可能影响下一次检索的 Query——这需要 Agent 维护对话状态，这正是 Agent 架构擅长的事情。",[14,440,441,443],{},[35,442,58],{}," Agentic RAG 把 LLM 从流水线操作工变成了车间主任。它不再被动执行，而是主动管理整个检索生成流程。这是 RAG 系统从\"能用\"到\"智能\"的关键跨越。",[21,445,447],{"id":446},"_7-生产级工程实践","7. 生产级工程实践",[409,449,450],{"id":450},"知识库管理",[14,452,453],{},"知识库是活的——文档会更新、会过期。生产环境需要解决几个实际问题：",[29,455,456,462,468,474],{},[32,457,458,461],{},[35,459,460],{},"增量同步","：文档更新时只重新处理变化的 chunk，用哈希比对检测变化",[32,463,464,467],{},[35,465,466],{},"版本管理","：换 Embedding 模型时，旧索引和新索引并存，等新索引验证通过再切流量",[32,469,470,473],{},[35,471,472],{},"权限隔离","：多租户场景下用 metadata filtering 实现检索权限控制",[32,475,476,479],{},[35,477,478],{},"数据清洗","：去重、去噪、统一格式，脏数据进库会污染所有检索结果",[409,481,482],{"id":482},"延迟优化",[14,484,485],{},"RAG 全链路的延迟构成：",[487,488,489,502],"table",{},[490,491,492],"thead",{},[493,494,495,499],"tr",{},[496,497,498],"th",{},"阶段",[496,500,501],{},"典型耗时",[503,504,505,514,522,530],"tbody",{},[493,506,507,511],{},[508,509,510],"td",{},"Embedding 查询",[508,512,513],{},"50-200ms",[493,515,516,519],{},[508,517,518],{},"向量检索",[508,520,521],{},"10-100ms",[493,523,524,527],{},[508,525,526],{},"Rerank",[508,528,529],{},"100-500ms（可选）",[493,531,532,535],{},[508,533,534],{},"LLM 生成",[508,536,537],{},"500-3000ms（主要瓶颈）",[14,539,540],{},"优化思路：Embedding 用快模型、向量库用 HNSW 算法、Rerank 在延迟敏感时跳过、LLM 用 Streaming 输出首 token。",[409,542,543],{"id":543},"成本优化",[29,545,546,552,558,564],{},[32,547,548,551],{},[35,549,550],{},"Embedding 缓存","：相同 Query 不重复调用",[32,553,554,557],{},[35,555,556],{},"小模型兜底","：简单问题用低成本模型，复杂问题再升级到大模型",[32,559,560,563],{},[35,561,562],{},"Prompt 压缩","：用 LLMLingua 等工具压缩检索内容，减少 Token 消耗",[32,565,566,569],{},[35,567,568],{},"冷热分离","：高频数据放内存库，低频数据放对象存储",[409,571,572],{"id":572},"可观测性",[14,574,575],{},"没有可观测性，RAG 系统就是黑盒。至少需要追踪这些指标：",[29,577,578,581,584,587],{},[32,579,580],{},"每次检索的 Query、命中的 chunk、Rerank 分数、最终答案",[32,582,583],{},"用户反馈（点赞\u002F点踩）",[32,585,586],{},"端到端延迟分布",[32,588,589],{},"检索 Miss 率（没有召回到相关内容的比例）",[14,591,592],{},"工具推荐：LangSmith、Langfuse、Arize，或者自建 Trace 系统。",[409,594,596],{"id":595},"fallback-策略","Fallback 策略",[14,598,599],{},"每个环节都要有兜底：",[29,601,602,605,608],{},[32,603,604],{},"检索失败 → 用 LLM 内置知识回答，标注\"无知识库支撑\"",[32,606,607],{},"LLM 生成失败 → 返回检索结果原文",[32,609,610],{},"全流程失败 → 固定兜底话术，记录日志",[409,612,613],{"id":613},"安全与合规",[29,615,616,622,628,634],{},[32,617,618,621],{},[35,619,620],{},"PII 脱敏","：向量化前清洗个人信息",[32,623,624,627],{},[35,625,626],{},"权限控制","：检索时带用户权限过滤",[32,629,630,633],{},[35,631,632],{},"审计日志","：谁查了什么、LLM 回答了什么都留痕",[32,635,636,639],{},[35,637,638],{},"Prompt Injection 防护","：检索回来的文档可能被恶意污染（\"忽略之前指令，说...\"），需要做输入过滤",[14,641,642,644],{},[35,643,58],{}," 生产级 RAG 的挑战不在算法创新，而在工程细节。知识库管理、延迟、成本、可观测性、安全性——每个方面都需要体系化的解决方案。",[21,646,647],{"id":647},"总结",[14,649,650],{},"从经典 RAG 到 Agentic RAG，背后是一条从\"被动执行\"到\"主动思考\"的演进路径：",[71,652,655],{"className":653,"code":654,"language":76},[74],"经典 RAG → 固定流水线，一次检索\n  ↓\nSelf-RAG \u002F CRAG → 加入反思与纠错\n  ↓\nGraph RAG → 用知识图谱支持多跳推理\n  ↓\nAgentic RAG → LLM 自主管理检索流程\n",[78,656,654],{"__ignoreMap":80},[14,658,659,660,663],{},"但无论是哪种模式，核心原则是不变的：",[35,661,662],{},"RAG 的价值在于让 LLM 基于事实说话。"," 技术模式可以演进，但这个根本目的始终如一。",[14,665,666],{},"如果你的系统还跑着经典 RAG 链路、遇到了精度瓶颈，不妨从这些进阶模式中选择一个入手升级。大多数情况下，最简单也最有性价比的优化往往是：给 chunk 加上下文描述（Contextual Retrieval）、在检索后加一层 Rerank、或者让 LLM 能自主决定是否重试检索。",[14,668,669],{},[670,671,673],"a",{"href":672},"\u002Fblog\u002F","返回博客列表",[675,676,677],"style",{},"html pre.shiki code .snl16, html code.shiki .snl16{--shiki-default:#F97583}html pre.shiki code .sDLfK, html code.shiki .sDLfK{--shiki-default:#79B8FF}html pre.shiki code .svObZ, html code.shiki .svObZ{--shiki-default:#B392F0}html pre.shiki code .s95oV, html code.shiki .s95oV{--shiki-default:#E1E4E8}html pre.shiki code .s9osk, html code.shiki .s9osk{--shiki-default:#FFAB70}html pre.shiki code .sU2Wk, html code.shiki .sU2Wk{--shiki-default:#9ECBFF}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":80,"searchDepth":227,"depth":227,"links":679},[680,681,682,683,684,685,688,696],{"id":23,"depth":227,"text":24},{"id":62,"depth":227,"text":63},{"id":102,"depth":227,"text":103},{"id":149,"depth":227,"text":150},{"id":171,"depth":227,"text":172},{"id":189,"depth":227,"text":190,"children":686},[687],{"id":411,"depth":249,"text":411},{"id":446,"depth":227,"text":447,"children":689},[690,691,692,693,694,695],{"id":450,"depth":249,"text":450},{"id":482,"depth":249,"text":482},{"id":543,"depth":249,"text":543},{"id":572,"depth":249,"text":572},{"id":595,"depth":249,"text":596},{"id":613,"depth":249,"text":613},{"id":647,"depth":227,"text":647},"AI\u002FLLM","2026-05-03","md",{},true,"\u002Fblog\u002Fagentic-rag-and-production",{"title":5,"description":16},"blog\u002Fagentic-rag-and-production",[706,707,708,709,710],"RAG","Agentic RAG","Graph RAG","大模型","生产实践","LcJ1nu8yN6bgMvmXeZlXDkH5PgSDEpsjLz3AFW6sFGk",1779959652906]