Similarity Search & Indexing
You have a corpus of embedding vectors and a query vector. The task: find the k stored vectors closest to the query. Doing that fast at scale is the entire engineering problem of a vector database.
Exact search and why it doesn’t scale
Section titled “Exact search and why it doesn’t scale”The straightforward approach — k-nearest-neighbor (kNN) — compares the query against every stored vector:
def exact_search(query, vectors, k): scored = [(cosine(query, v), i) for i, v in enumerate(vectors)] return sorted(scored, reverse=True)[:k]This is 100% accurate and perfectly fine for thousands of vectors. But it’s
O(n) per query — at millions or billions of vectors, comparing against all
of them per request is far too slow. You need an index.
Approximate nearest neighbor (ANN)
Section titled “Approximate nearest neighbor (ANN)”The breakthrough idea: give up a tiny bit of accuracy for an enormous speedup. ANN algorithms return vectors that are almost certainly the nearest — say the true top-10 with 98% reliability — in a fraction of the time.
The quality measure is recall: of the true top-k neighbors, what fraction did the index actually return? 0.98 recall means it found 98% of them. Nearly every production vector search is ANN; the lost 2% rarely matters, and the speedup is the difference between viable and not.
HNSW — the dominant index
Section titled “HNSW — the dominant index”HNSW (Hierarchical Navigable Small World) is the most widely used ANN index. Picture a multi-layer graph of vectors:
A search drops in at the sparse top layer, greedily hops toward the query, descends a layer, refines, and repeats. It reaches the neighborhood in roughly logarithmic time instead of linear.
- Strengths — excellent recall and speed; great for live, changing data.
- Costs — high memory use; the graph must be held largely in RAM.
Two knobs you’ll meet: build-time graph connectivity (often M) and the
search-time effort (often ef_search) — raising the latter trades latency for
recall.
IVF — partition then search
Section titled “IVF — partition then search”IVF (Inverted File Index) clusters vectors into buckets. A query is matched to the nearest few bucket centers, and only those buckets are scanned.
- Strengths — lower memory than HNSW; fast to build.
- Costs — recall depends on how many buckets you probe; less friendly to constant updates.
To go bigger, IVF is often paired with quantization — compressing vectors into a compact, lossy form so far more fit in memory, at some recall cost.
The trade-off triangle
Section titled “The trade-off triangle”Every ANN index balances three things, and you can’t max all of them:
- Push recall up → slower queries or more memory.
- Push speed up → lower recall or more memory.
- Cut memory (quantization) → lower recall.
Tune to the use case: a user-facing search bar wants speed and decent recall; an offline analytics job can spend latency to get recall near 1.0.
Practical guidance
Section titled “Practical guidance”- Default to HNSW — it’s the best all-rounder and what most engines use out of the box.
- Choose IVF + quantization when memory or cost is the binding constraint at very large scale.
- Measure recall on your own data with a known-answer set. Don’t trust
defaults blindly; tune
ef_search(or probe count) until recall and latency both meet your bar.
Key takeaways
Section titled “Key takeaways”Exact search is accurate but O(n) — fine small, hopeless at scale. ANN trades
a sliver of recall for a massive speedup and powers nearly all production vector
search. HNSW (a navigable graph) is the high-performance default; IVF
(clustering, often with quantization) wins when memory is tight. Every index
balances recall, speed, and memory — tune that triangle to your workload, and
skip indexing entirely for small corpora.