RAG Patterns
RAG is not one architecture — it’s a family. These patterns are the recurring shapes, ordered simple to complex. Match the pattern to the problem and resist climbing higher than you need.
Pattern 1 — Simple Q&A RAG
Section titled “Pattern 1 — Simple Q&A RAG”The baseline. One knowledge source, one retrieval step, one generation step.
Use when: a single, reasonably uniform knowledge base; questions answerable from one retrieval. Strengths: simple, fast, cheap, debuggable. Start here for almost everything — and add complexity only when evaluation proves you need it.
Pattern 2 — Query routing
Section titled “Pattern 2 — Query routing”Different question types need different handling. A router classifies the query first and dispatches it.
Use when: queries are heterogeneous, or some shouldn’t trigger retrieval at all. Watch: the router is a failure point — misroute and the answer is doomed; evaluate it on its own.
Pattern 3 — Multi-source retrieval
Section titled “Pattern 3 — Multi-source retrieval”One question genuinely needs several sources at once. Retrieve from each in parallel, merge, rerank, then generate.
Use when: complete answers span multiple repositories. Watch: merging needs a reranker so the best chunks survive regardless of source; cost and latency rise with each source.
Pattern 4 — Query transformation RAG
Section titled “Pattern 4 — Query transformation RAG”The raw question is a poor search query, so transform it before retrieving — rewrite, expand to multiple queries, or decompose. See Chunking & Retrieval.
Use when: conversational follow-ups (“what about the other one?”), vague questions, or multi-part questions. Watch: an extra LLM call before retrieval — latency for recall.
Pattern 5 — Agentic RAG
Section titled “Pattern 5 — Agentic RAG”Retrieval becomes a tool an agent decides to use. The agent chooses whether to retrieve, from where, judges if results suffice, and can search again.
Use when: unpredictable queries, multi-hop questions, results that need iterative refinement. Watch: multiple LLM calls — the most expensive and slowest pattern, and the hardest to make predictable. Adopt last.
Cross-cutting components
Section titled “Cross-cutting components”Most production RAG, whatever the pattern, also includes:
- Hybrid search (vector + keyword) — so exact terms aren’t lost.
- Reranking — retrieve broad, rerank, keep the best few.
- Metadata filtering — permissions, recency, tenant isolation.
- Caching — for repeated or similar queries.
- Citations — every claim traceable to a source.
Choosing a pattern
Section titled “Choosing a pattern”| Situation | Pattern |
|---|---|
| One knowledge base, direct questions | Simple Q&A RAG |
| Distinct query types / some need no retrieval | Query routing |
| Answers span multiple sources | Multi-source retrieval |
| Vague, conversational, or multi-part queries | Query transformation |
| Unpredictable or multi-hop, need iteration | Agentic RAG |
Key takeaways
Section titled “Key takeaways”RAG is a family of patterns of rising complexity: simple Q&A, query routing, multi-source, query transformation, and agentic RAG. Almost every system should begin with simple Q&A. Most production RAG layers in hybrid search, reranking, metadata filtering, caching, and citations regardless of pattern. Choose by the problem’s actual shape, and move to a more complex pattern only when evaluation proves the simpler one is failing.