Skip to content
About

Production Case Studies

Patterns in isolation are abstract. These four case studies assemble them into whole systems — each showing the architecture, the key decisions, and the trade-offs. They are illustrative blueprints, not prescriptions.

Goal: let users ask natural-language questions against a large product documentation site.

Architecture: Simple Q&A RAG, with hybrid search and reranking.

Question Hybrid search Rerank Top-5 chunks Grounded prompt Answer + citations

Key decisions:

  • Chunking — split on document headings so chunks stay topically coherent; prepend the page title to each.
  • Hybrid search — docs are full of exact terms (API names, flags, error codes) that pure vector search misses.
  • Citations — every answer links its sources; users verify, and the team spots hallucinations.
  • “I don’t know” — when retrieval is weak, say so rather than guess.

Trade-offs: chose a strong model for accuracy over a cheaper one; reranking adds latency but sharply lifts answer quality. Lesson: most quality came from retrieval work — chunking and hybrid search — not from prompt wording.

Case study 2 — Customer support automation

Section titled “Case study 2 — Customer support automation”

Goal: resolve common support tickets automatically; escalate the rest.

Architecture: Routing plus RAG plus guarded actions.

Ticket Classifier Simple FAQ Account action Complex / unsure RAG answer Tool call + approval Escalate to human

Key decisions:

  • Routing first — not every ticket should hit the LLM; classification is cheap and keeps each path simple.
  • Actions are gated — refunds and account changes run as validated tool calls behind human approval; read-only by default.
  • Conservative escalation — low retrieval confidence escalates. A wrong answer costs more than a human-handled ticket.
  • Evaluation — a labeled ticket set scores accuracy and escalation rate.

Trade-offs: accepted a lower automation rate to keep accuracy high — the correct call for support. Lesson: the design is mostly about what the AI won’t do alone.

Goal: migrate a large codebase from one framework version to another.

Architecture: Evaluator-optimizer — agentic, justified because per-file steps can’t be known in advance.

no — read the failure, retry (capped) For each file Agent edits Run tests Pass? Next file yes

Key decisions:

  • Tests are the evaluator — the existing suite is an objective pass/fail signal, so the agent self-corrects. Without tests, this pattern wouldn’t work.
  • Bounded scope — one file at a time; small, reviewable diffs.
  • Hard limits — capped retries per file; flag for a human on repeated failure.
  • Human review — every change is a reviewed PR. The agent drafts; humans approve.

Trade-offs: agentic cost and unpredictability were warranted because the task is genuinely open-ended and has a clean verification signal. Lesson: agents shine when an objective check closes the loop — see agentic coding.

Section titled “Case study 4 — Internal knowledge search”

Goal: one assistant answering across wikis, tickets, code, and a metrics database.

Architecture: Query routing plus multi-source retrieval.

Query Router Docs + wiki Past tickets Code search SQL — metrics Merge + rerank Answer

Key decisions:

  • Routing — “how many deploys last week” is a SQL query, not a vector search. The router separates retrieval questions from analytical ones.
  • Permissions — metadata filtering enforces per-user access; retrieval must never leak restricted content.
  • Per-source evaluation — each source’s retrieval is measured separately to locate weak spots.

Trade-offs: more moving parts, so heavier evaluation and observability; justified by genuinely heterogeneous sources. Lesson: access control is an architecture requirement in RAG, not an afterthought.

Across all four:

  • Start simple; add complexity only when evaluation demands it. None began as their final shape.
  • Retrieval quality usually beats prompt cleverness for RAG systems.
  • Define what the AI won’t do — escalate, require approval, stay read-only.
  • Evaluation and observability are designed in, not bolted on.
  • The pattern follows the problem. Agentic was right for the code migration and wrong for the docs assistant — same toolkit, different shape.

Real AI systems assemble the patterns: a docs assistant is simple RAG done well; support automation is routing plus guarded actions; a code migrator is an evaluator-optimizer agent; knowledge search is routing across multiple sources. Every one starts simple, leans on retrieval quality and evaluation, explicitly bounds what the AI does alone, and matches the pattern to the problem. AI engineering is disciplined software engineering applied to an unreliable part.