Skip to content
About

Glossary

Every term the guide uses, defined in one or two plain sentences. It’s a reference, not a reading order — skim it, or jump in with your browser’s find (Ctrl/Cmd + F).

TermDefinition
Artificial intelligence (AI)Software that performs tasks associated with human cognition. In practice today, software whose behavior is learned from data rather than hand-coded.
Machine learning (ML)A subset of AI: systems that improve at a task by learning patterns from data instead of following written rules.
Deep learning (DL)A subset of ML that uses neural networks with many layers, which learn their own features from raw data.
Generative AIModels whose output is new content — text, images, audio, code — as opposed to a label or score.
Discriminative modelA model that outputs a label or number about an input (e.g. spam / not spam), the counterpart of a generative model.
Foundation modelA single large model pretrained on broad data, then adapted to many downstream tasks. LLMs are foundation models for text.
Supervised learningLearning from labeled examples — inputs paired with correct outputs.
Unsupervised learningLearning structure from unlabeled data, e.g. clustering.
Reinforcement learning (RL)Learning by acting in an environment and receiving rewards; the basis of RLHF.
Self-supervised learningLearning from unlabeled data that supplies its own labels (e.g. predict the next word). It made internet-scale pretraining possible.
OverfittingWhen a model memorizes its training data, including noise, and fails to generalize to new inputs.
GeneralizationA model’s performance on new, unseen data — the real goal of training.
BaselineA trivial reference result (e.g. predict the most common class); a model’s score is only meaningful as a delta over it.
TermDefinition
Parameters (weights)The adjustable numbers inside a model that training tunes; parameter count (“7B”, “70B”) is a rough proxy for capacity.
Neural networkA function built from layers of simple units (“neurons”); the basis of deep learning.
TransformerThe neural network architecture behind virtually all modern AI, built on self-attention and highly parallelizable.
AttentionThe mechanism that lets a model weigh how much every position in a sequence should draw from every other.
Gradient descentThe optimization process that nudges parameters to reduce the loss, step by step.
Loss functionA single number measuring how wrong a model’s predictions are; training minimizes it.
BackpropagationThe algorithm that computes how much each parameter contributed to the loss, enabling the update.
HyperparameterA setting chosen before training (learning rate, batch size, layers), as opposed to a learned parameter.
Epoch / batchA batch is a small group of examples processed together; an epoch is one full pass over the training data.
QuantizationStoring model weights at lower numerical precision (e.g. 4-bit) to cut memory, for a small quality cost.
Fine-tuningContinuing training of a pretrained model on a small, task-specific dataset to adapt its behavior.
LoRALow-Rank Adaptation — a parameter-efficient fine-tuning method that trains tiny add-on adapters instead of the whole model.
TermDefinition
Large language model (LLM)A foundation model trained to predict the next token of text; the engine behind chat, coding, and most generative text AI.
TokenA chunk of text (~¾ of a word) — the unit of billing, context limits, and latency.
TokenizationSplitting text into tokens from a fixed vocabulary before the model processes it.
Context windowThe maximum number of tokens a model can consider at once — input and output combined.
EmbeddingA vector of numbers representing the meaning of content, positioned so similar meanings are close together.
TemperatureA decoding setting controlling randomness — 0 is near-deterministic, higher is more varied.
Top-p (nucleus sampling)A decoding setting that restricts token choices to the smallest set covering probability p.
PretrainingThe large, expensive phase where a model learns general knowledge from a vast corpus, self-supervised.
Post-trainingTurning a base model into an assistant via instruction tuning (SFT) and preference alignment (RLHF / DPO).
HallucinationFluent, confident, wrong output — intrinsic to LLMs, mitigated but never eliminated.
In-context learningAn LLM performing a task from examples given in the prompt, with no retraining — what makes prompting work.
Structured outputModel output constrained to a machine-readable schema (usually JSON), so code can safely consume it.
InferenceRunning a trained model to produce output (as opposed to training it).
TermDefinition
Retrieval-augmented generation (RAG)Fetching relevant information at request time and inserting it into the prompt, so the model answers from trusted data.
Vector databaseA store optimized for finding items by similarity of their embedding vectors.
Semantic searchSearch by meaning rather than keyword overlap, powered by embeddings.
ChunkingSplitting documents into passages small enough to embed and retrieve usefully.
Cosine similarityA measure of how similar two vectors are by the angle between them; the default for text embeddings.
Approximate nearest neighbor (ANN)Search that trades a little accuracy for a large speed gain when finding similar vectors at scale.
HNSWA widely used ANN index structured as a navigable multi-layer graph.
RerankingA second-stage model that re-scores retrieved candidates for relevance, keeping only the best few.
Hybrid searchCombining semantic (vector) and keyword (lexical) search and fusing the rankings.
GroundingTying a model’s answer to retrieved source material so claims are supported and verifiable.
TermDefinition
AI agentAn LLM placed in a loop with tools, allowed to decide its own next step until a task is done.
Agent loop / ReActThe think → act → observe cycle an agent repeats; ReAct interleaves explicit reasoning with actions.
Tool use / function callingThe mechanism by which a model requests an action (a function with arguments) that your code then executes.
Prompt engineeringDesigning the input that makes an LLM produce the desired output reliably.
System promptThe instruction that sets a model’s durable role, rules, and default behavior, separate from the user’s message.
Chain-of-thought (CoT)Prompting a model to reason step by step before answering, improving accuracy on multi-step problems.
Zero-shot / few-shotPrompting with no examples (zero-shot) or with a few worked examples (few-shot).
OrchestrationThe application code that decides the steps of an AI workflow — when to retrieve, call a tool, or loop.
MemoryAn agent’s state: working memory (the current task, in the context window) and long-term memory (stored, retrieved across sessions).
Workflow vs. agentA workflow has control flow defined by your code; an agent has control flow decided by the LLM at runtime.
TermDefinition
GPU / VRAMThe hardware that runs neural network math in parallel; VRAM (GPU memory) is usually the binding constraint.
Inference serverSoftware (e.g. vLLM) that serves a model efficiently with batching and KV-cache management.
KV cacheStored intermediate values that speed up token-by-token generation, at the cost of GPU memory.
Throughput vs. latencyThroughput is requests served per second; latency is how fast one request completes — the two trade off.
Time to first token (TTFT)How long before a model’s response starts streaming — a key interactive-latency metric.
MLOps / LLMOpsThe practices for deploying, monitoring, and continuously improving ML models (MLOps) and LLM systems (LLMOps).
Model registryVersion control for trained models — tracking versions, metrics, stages, and enabling rollback.
DriftThe gradual divergence of live data from training data, which silently degrades model quality over time.
Evaluation set (eval)A curated set of test cases with expected outcomes, used to measure quality and catch regressions.
LLM-as-judgeUsing a strong LLM to score other models’ outputs against a rubric, for scalable evaluation.
Observability / tracingLogging the full detail of each request — prompts, retrieved context, tool calls, cost, latency — to debug and monitor.
StreamingSending output tokens as they are generated, cutting perceived latency for interactive use.
TermDefinition
Prompt injectionAn attack where untrusted text carries instructions the model follows; indirect injection hides them in retrieved or browsed content.
JailbreakCoaxing a model past its built-in safety training, distinct from injecting your application’s instructions.
GuardrailsIndependent checks around a model — classifiers or code — that screen input and output for unsafe or off-policy content.
Least privilegeGranting tools and agents the minimum permissions needed, so a successful attack has a small blast radius.
PIIPersonally identifiable information — data that must be handled, minimized, and often redacted before reaching a model.
AlignmentTraining (e.g. via RLHF) that makes a model helpful, harmless, and honest by human standards.
BiasSkewed or unfair model behavior learned from skewed training data; critical wherever AI decides about people.
Red-teamingActively attacking your own AI system before launch to find safety and security failures.

This glossary distills the full guide. To go deeper on any term, start from the relevant section — AI Fundamentals, LLM Engineering, RAG, AI Agents, or AI Safety & Security.