Similarity Search & Indexing

You have a corpus of embedding vectors and a query vector. The task: find the k stored vectors closest to the query. Doing that fast at scale is the entire engineering problem of a vector database.

Exact search and why it doesn’t scale

The straightforward approach — k-nearest-neighbor (kNN) — compares the query against every stored vector:

def exact_search(query, vectors, k):
    scored = [(cosine(query, v), i) for i, v in enumerate(vectors)]
    return sorted(scored, reverse=True)[:k]

This is 100% accurate and perfectly fine for thousands of vectors. But it’s O(n) per query — at millions or billions of vectors, comparing against all of them per request is far too slow. You need an index.

Approximate nearest neighbor (ANN)

The breakthrough idea: give up a tiny bit of accuracy for an enormous speedup. ANN algorithms return vectors that are almost certainly the nearest — say the true top-10 with 98% reliability — in a fraction of the time.

The quality measure is recall: of the true top-k neighbors, what fraction did the index actually return? 0.98 recall means it found 98% of them. Nearly every production vector search is ANN; the lost 2% rarely matters, and the speedup is the difference between viable and not.

HNSW — the dominant index

HNSW (Hierarchical Navigable Small World) is the most widely used ANN index. Picture a multi-layer graph of vectors:

A search drops in at the sparse top layer, greedily hops toward the query, descends a layer, refines, and repeats. It reaches the neighborhood in roughly logarithmic time instead of linear.

Strengths — excellent recall and speed; great for live, changing data.
Costs — high memory use; the graph must be held largely in RAM.

Two knobs you’ll meet: build-time graph connectivity (often M) and the search-time effort (often ef_search) — raising the latter trades latency for recall.

IVF — partition then search

IVF (Inverted File Index) clusters vectors into buckets. A query is matched to the nearest few bucket centers, and only those buckets are scanned.

Strengths — lower memory than HNSW; fast to build.
Costs — recall depends on how many buckets you probe; less friendly to constant updates.

To go bigger, IVF is often paired with quantization — compressing vectors into a compact, lossy form so far more fit in memory, at some recall cost.

The trade-off triangle

Every ANN index balances three things, and you can’t max all of them:

Push recall up → slower queries or more memory.
Push speed up → lower recall or more memory.
Cut memory (quantization) → lower recall.

Tune to the use case: a user-facing search bar wants speed and decent recall; an offline analytics job can spend latency to get recall near 1.0.

Practical guidance

Default to HNSW — it’s the best all-rounder and what most engines use out of the box.
Choose IVF + quantization when memory or cost is the binding constraint at very large scale.
Measure recall on your own data with a known-answer set. Don’t trust defaults blindly; tune ef_search (or probe count) until recall and latency both meet your bar.

Key takeaways

Exact search is accurate but O(n) — fine small, hopeless at scale. ANN trades a sliver of recall for a massive speedup and powers nearly all production vector search. HNSW (a navigable graph) is the high-performance default; IVF (clustering, often with quantization) wins when memory is tight. Every index balances recall, speed, and memory — tune that triangle to your workload, and skip indexing entirely for small corpora.