Embeddings
An embedding is a list of numbers — a vector — that represents the meaning of a piece of content. It is the data type that makes semantic search possible.
Meaning as coordinates
Section titled “Meaning as coordinates”An embedding model maps text to a point in a high-dimensional space, arranged so that similar meanings land near each other — regardless of shared words.
The first two cluster despite sharing almost no words; the weather queries form their own cluster. The model learned this geometry from the self-supervised structure of language.
A real embedding isn’t 2-D — it has hundreds or thousands of dimensions. Each dimension is a learned axis of meaning; you can’t name them, but together they position text precisely.
Embedding models
Section titled “Embedding models”Embedding models are separate from chat/generation models, and specialized for this job.
from openai import OpenAIclient = OpenAI()
resp = client.embeddings.create( model="text-embedding-3-small", input="How do I reset my password?",)vector = resp.data[0].embedding # e.g. a list of 1536 floatsWhen picking one, weigh:
- Dimensions — vector length (commonly 384–3072). More can capture more nuance but costs more storage and compute. Bigger is not automatically better.
- Max input length — how much text the model embeds at once; sets your chunk size.
- Quality — task-relevant benchmarks (e.g. the MTEB leaderboard) beat vendor marketing.
- API vs. self-hosted — managed APIs are simplest; open models (running locally) cut cost and keep data in-house.
- Domain fit — general models can struggle with legal, medical, or code text; check, or use a domain-tuned model.
Similarity metrics
Section titled “Similarity metrics”To find “nearby” vectors you need a distance measure. Three are common:
| Metric | Measures | Notes |
|---|---|---|
| Cosine similarity | Angle between vectors | The default for text; ignores magnitude |
| Dot product | Angle and magnitude | Fast; equals cosine if vectors are normalized |
| Euclidean (L2) | Straight-line distance | Common for image and spatial data |
For text embeddings, cosine similarity is almost always the right choice — it compares direction (meaning) and ignores length. Most embedding models are trained with it in mind. Whatever you choose, use the same metric for indexing and querying.
Multimodal embeddings
Section titled “Multimodal embeddings”Embeddings aren’t limited to text. Multimodal models (CLIP-style) embed text and images into one shared space, so a text query can retrieve relevant images. The same idea extends to audio and code. The mechanics in this section — vectors, similarity, indexing — are identical regardless of what was embedded.
What embeddings are not
Section titled “What embeddings are not”- Not human-readable — you can’t reverse a vector back into exact text.
- Not reasoning — they capture similarity, not logic or truth.
- Not free — generating embeddings is a model call with real cost and latency; embed at ingestion time and store the result.
Key takeaways
Section titled “Key takeaways”An embedding is a vector encoding the meaning of content, positioned so similar meanings are geometrically close. Embedding models are specialized; choose by dimensions, input length, quality, and domain fit — and embed queries and documents with the same model. Compare text vectors with cosine similarity. Multimodal models put text and images in one shared space.