Embeddings

An embedding is a list of numbers — a vector — that represents the meaning of a piece of content. It is the data type that makes semantic search possible.

Meaning as coordinates

An embedding model maps text to a point in a high-dimensional space, arranged so that similar meanings land near each other — regardless of shared words.

The first two cluster despite sharing almost no words; the weather queries form their own cluster. The model learned this geometry from the self-supervised structure of language.

A real embedding isn’t 2-D — it has hundreds or thousands of dimensions. Each dimension is a learned axis of meaning; you can’t name them, but together they position text precisely.

Embedding models

Embedding models are separate from chat/generation models, and specialized for this job.

from openai import OpenAI
client = OpenAI()

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
)
vector = resp.data[0].embedding   # e.g. a list of 1536 floats

When picking one, weigh:

Dimensions — vector length (commonly 384–3072). More can capture more nuance but costs more storage and compute. Bigger is not automatically better.
Max input length — how much text the model embeds at once; sets your chunk size.
Quality — task-relevant benchmarks (e.g. the MTEB leaderboard) beat vendor marketing.
API vs. self-hosted — managed APIs are simplest; open models (running locally) cut cost and keep data in-house.
Domain fit — general models can struggle with legal, medical, or code text; check, or use a domain-tuned model.

Similarity metrics

To find “nearby” vectors you need a distance measure. Three are common:

Metric	Measures	Notes
Cosine similarity	Angle between vectors	The default for text; ignores magnitude
Dot product	Angle and magnitude	Fast; equals cosine if vectors are normalized
Euclidean (L2)	Straight-line distance	Common for image and spatial data

For text embeddings, cosine similarity is almost always the right choice — it compares direction (meaning) and ignores length. Most embedding models are trained with it in mind. Whatever you choose, use the same metric for indexing and querying.

Multimodal embeddings

Embeddings aren’t limited to text. Multimodal models (CLIP-style) embed text and images into one shared space, so a text query can retrieve relevant images. The same idea extends to audio and code. The mechanics in this section — vectors, similarity, indexing — are identical regardless of what was embedded.

What embeddings are not

Not human-readable — you can’t reverse a vector back into exact text.
Not reasoning — they capture similarity, not logic or truth.
Not free — generating embeddings is a model call with real cost and latency; embed at ingestion time and store the result.

Key takeaways

An embedding is a vector encoding the meaning of content, positioned so similar meanings are geometrically close. Embedding models are specialized; choose by dimensions, input length, quality, and domain fit — and embed queries and documents with the same model. Compare text vectors with cosine similarity. Multimodal models put text and images in one shared space.