Skip to content

Glossary

Every term the guide uses, defined in one or two plain sentences. It’s a reference, not a reading order — skim it, or jump in with your browser’s find (Ctrl/Cmd + F).

Foundations

Term	Definition
Artificial intelligence (AI)	Software that performs tasks associated with human cognition. In practice today, software whose behavior is learned from data rather than hand-coded.
Machine learning (ML)	A subset of AI: systems that improve at a task by learning patterns from data instead of following written rules.
Deep learning (DL)	A subset of ML that uses neural networks with many layers, which learn their own features from raw data.
Generative AI	Models whose output is new content — text, images, audio, code — as opposed to a label or score.
Discriminative model	A model that outputs a label or number about an input (e.g. spam / not spam), the counterpart of a generative model.
Foundation model	A single large model pretrained on broad data, then adapted to many downstream tasks. LLMs are foundation models for text.
Supervised learning	Learning from labeled examples — inputs paired with correct outputs.
Unsupervised learning	Learning structure from unlabeled data, e.g. clustering.
Reinforcement learning (RL)	Learning by acting in an environment and receiving rewards; the basis of RLHF.
Self-supervised learning	Learning from unlabeled data that supplies its own labels (e.g. predict the next word). It made internet-scale pretraining possible.
Overfitting	When a model memorizes its training data, including noise, and fails to generalize to new inputs.
Generalization	A model’s performance on new, unseen data — the real goal of training.
Baseline	A trivial reference result (e.g. predict the most common class); a model’s score is only meaningful as a delta over it.

Models & training

Term	Definition
Parameters (weights)	The adjustable numbers inside a model that training tunes; parameter count (“7B”, “70B”) is a rough proxy for capacity.
Neural network	A function built from layers of simple units (“neurons”); the basis of deep learning.
Transformer	The neural network architecture behind virtually all modern AI, built on self-attention and highly parallelizable.
Attention	The mechanism that lets a model weigh how much every position in a sequence should draw from every other.
Gradient descent	The optimization process that nudges parameters to reduce the loss, step by step.
Loss function	A single number measuring how wrong a model’s predictions are; training minimizes it.
Backpropagation	The algorithm that computes how much each parameter contributed to the loss, enabling the update.
Hyperparameter	A setting chosen before training (learning rate, batch size, layers), as opposed to a learned parameter.
Epoch / batch	A batch is a small group of examples processed together; an epoch is one full pass over the training data.
Quantization	Storing model weights at lower numerical precision (e.g. 4-bit) to cut memory, for a small quality cost.
Fine-tuning	Continuing training of a pretrained model on a small, task-specific dataset to adapt its behavior.
LoRA	Low-Rank Adaptation — a parameter-efficient fine-tuning method that trains tiny add-on adapters instead of the whole model.

LLMs & generation

Term	Definition
Large language model (LLM)	A foundation model trained to predict the next token of text; the engine behind chat, coding, and most generative text AI.
Token	A chunk of text (~¾ of a word) — the unit of billing, context limits, and latency.
Tokenization	Splitting text into tokens from a fixed vocabulary before the model processes it.
Context window	The maximum number of tokens a model can consider at once — input and output combined.
Embedding	A vector of numbers representing the meaning of content, positioned so similar meanings are close together.
Temperature	A decoding setting controlling randomness — 0 is near-deterministic, higher is more varied.
Top-p (nucleus sampling)	A decoding setting that restricts token choices to the smallest set covering probability p.
Pretraining	The large, expensive phase where a model learns general knowledge from a vast corpus, self-supervised.
Post-training	Turning a base model into an assistant via instruction tuning (SFT) and preference alignment (RLHF / DPO).
Hallucination	Fluent, confident, wrong output — intrinsic to LLMs, mitigated but never eliminated.
In-context learning	An LLM performing a task from examples given in the prompt, with no retraining — what makes prompting work.
Structured output	Model output constrained to a machine-readable schema (usually JSON), so code can safely consume it.
Inference	Running a trained model to produce output (as opposed to training it).

Retrieval & RAG

Term	Definition
Retrieval-augmented generation (RAG)	Fetching relevant information at request time and inserting it into the prompt, so the model answers from trusted data.
Vector database	A store optimized for finding items by similarity of their embedding vectors.
Semantic search	Search by meaning rather than keyword overlap, powered by embeddings.
Chunking	Splitting documents into passages small enough to embed and retrieve usefully.
Cosine similarity	A measure of how similar two vectors are by the angle between them; the default for text embeddings.
Approximate nearest neighbor (ANN)	Search that trades a little accuracy for a large speed gain when finding similar vectors at scale.
HNSW	A widely used ANN index structured as a navigable multi-layer graph.
Reranking	A second-stage model that re-scores retrieved candidates for relevance, keeping only the best few.
Hybrid search	Combining semantic (vector) and keyword (lexical) search and fusing the rankings.
Grounding	Tying a model’s answer to retrieved source material so claims are supported and verifiable.

Agents & prompting

Term	Definition
AI agent	An LLM placed in a loop with tools, allowed to decide its own next step until a task is done.
Agent loop / ReAct	The think → act → observe cycle an agent repeats; ReAct interleaves explicit reasoning with actions.
Tool use / function calling	The mechanism by which a model requests an action (a function with arguments) that your code then executes.
Prompt engineering	Designing the input that makes an LLM produce the desired output reliably.
System prompt	The instruction that sets a model’s durable role, rules, and default behavior, separate from the user’s message.
Chain-of-thought (CoT)	Prompting a model to reason step by step before answering, improving accuracy on multi-step problems.
Zero-shot / few-shot	Prompting with no examples (zero-shot) or with a few worked examples (few-shot).
Orchestration	The application code that decides the steps of an AI workflow — when to retrieve, call a tool, or loop.
Memory	An agent’s state: working memory (the current task, in the context window) and long-term memory (stored, retrieved across sessions).
Workflow vs. agent	A workflow has control flow defined by your code; an agent has control flow decided by the LLM at runtime.

Infrastructure & operations

Term	Definition
GPU / VRAM	The hardware that runs neural network math in parallel; VRAM (GPU memory) is usually the binding constraint.
Inference server	Software (e.g. vLLM) that serves a model efficiently with batching and KV-cache management.
KV cache	Stored intermediate values that speed up token-by-token generation, at the cost of GPU memory.
Throughput vs. latency	Throughput is requests served per second; latency is how fast one request completes — the two trade off.
Time to first token (TTFT)	How long before a model’s response starts streaming — a key interactive-latency metric.
MLOps / LLMOps	The practices for deploying, monitoring, and continuously improving ML models (MLOps) and LLM systems (LLMOps).
Model registry	Version control for trained models — tracking versions, metrics, stages, and enabling rollback.
Drift	The gradual divergence of live data from training data, which silently degrades model quality over time.
Evaluation set (eval)	A curated set of test cases with expected outcomes, used to measure quality and catch regressions.
LLM-as-judge	Using a strong LLM to score other models’ outputs against a rubric, for scalable evaluation.
Observability / tracing	Logging the full detail of each request — prompts, retrieved context, tool calls, cost, latency — to debug and monitor.
Streaming	Sending output tokens as they are generated, cutting perceived latency for interactive use.

Safety & security

Term	Definition
Prompt injection	An attack where untrusted text carries instructions the model follows; indirect injection hides them in retrieved or browsed content.
Jailbreak	Coaxing a model past its built-in safety training, distinct from injecting your application’s instructions.
Guardrails	Independent checks around a model — classifiers or code — that screen input and output for unsafe or off-policy content.
Least privilege	Granting tools and agents the minimum permissions needed, so a successful attack has a small blast radius.
PII	Personally identifiable information — data that must be handled, minimized, and often redacted before reaching a model.
Alignment	Training (e.g. via RLHF) that makes a model helpful, harmless, and honest by human standards.
Bias	Skewed or unfair model behavior learned from skewed training data; critical wherever AI decides about people.
Red-teaming	Actively attacking your own AI system before launch to find safety and security failures.

See also

This glossary distills the full guide. To go deeper on any term, start from the relevant section — AI Fundamentals, LLM Engineering, RAG, AI Agents, or AI Safety & Security.