Production Case Studies

Patterns in isolation are abstract. These four case studies assemble them into whole systems — each showing the architecture, the key decisions, and the trade-offs. They are illustrative blueprints, not prescriptions.

Case study 1 — Documentation assistant

Goal: let users ask natural-language questions against a large product documentation site.

Architecture: Simple Q&A RAG, with hybrid search and reranking.

Key decisions:

Chunking — split on document headings so chunks stay topically coherent; prepend the page title to each.
Hybrid search — docs are full of exact terms (API names, flags, error codes) that pure vector search misses.
Citations — every answer links its sources; users verify, and the team spots hallucinations.
“I don’t know” — when retrieval is weak, say so rather than guess.

Trade-offs: chose a strong model for accuracy over a cheaper one; reranking adds latency but sharply lifts answer quality. Lesson: most quality came from retrieval work — chunking and hybrid search — not from prompt wording.

Case study 2 — Customer support automation

Goal: resolve common support tickets automatically; escalate the rest.

Architecture: Routing plus RAG plus guarded actions.

Key decisions:

Routing first — not every ticket should hit the LLM; classification is cheap and keeps each path simple.
Actions are gated — refunds and account changes run as validated tool calls behind human approval; read-only by default.
Conservative escalation — low retrieval confidence escalates. A wrong answer costs more than a human-handled ticket.
Evaluation — a labeled ticket set scores accuracy and escalation rate.

Trade-offs: accepted a lower automation rate to keep accuracy high — the correct call for support. Lesson: the design is mostly about what the AI won’t do alone.

Case study 3 — Code migration agent

Goal: migrate a large codebase from one framework version to another.

Architecture: Evaluator-optimizer — agentic, justified because per-file steps can’t be known in advance.

Key decisions:

Tests are the evaluator — the existing suite is an objective pass/fail signal, so the agent self-corrects. Without tests, this pattern wouldn’t work.
Bounded scope — one file at a time; small, reviewable diffs.
Hard limits — capped retries per file; flag for a human on repeated failure.
Human review — every change is a reviewed PR. The agent drafts; humans approve.

Trade-offs: agentic cost and unpredictability were warranted because the task is genuinely open-ended and has a clean verification signal. Lesson: agents shine when an objective check closes the loop — see agentic coding.

Case study 4 — Internal knowledge search

Goal: one assistant answering across wikis, tickets, code, and a metrics database.

Architecture: Query routing plus multi-source retrieval.

Key decisions:

Routing — “how many deploys last week” is a SQL query, not a vector search. The router separates retrieval questions from analytical ones.
Permissions — metadata filtering enforces per-user access; retrieval must never leak restricted content.
Per-source evaluation — each source’s retrieval is measured separately to locate weak spots.

Trade-offs: more moving parts, so heavier evaluation and observability; justified by genuinely heterogeneous sources. Lesson: access control is an architecture requirement in RAG, not an afterthought.

Cross-cutting lessons

Across all four:

Start simple; add complexity only when evaluation demands it. None began as their final shape.
Retrieval quality usually beats prompt cleverness for RAG systems.
Define what the AI won’t do — escalate, require approval, stay read-only.
Evaluation and observability are designed in, not bolted on.
The pattern follows the problem. Agentic was right for the code migration and wrong for the docs assistant — same toolkit, different shape.

Key takeaways

Real AI systems assemble the patterns: a docs assistant is simple RAG done well; support automation is routing plus guarded actions; a code migrator is an evaluator-optimizer agent; knowledge search is routing across multiple sources. Every one starts simple, leans on retrieval quality and evaluation, explicitly bounds what the AI does alone, and matches the pattern to the problem. AI engineering is disciplined software engineering applied to an unreliable part.