Every term the guide uses, defined in one or two plain sentences. It’s a
reference, not a reading order — skim it, or jump in with your browser’s find
(Ctrl/Cmd + F).
| Term | Definition |
|---|
| Artificial intelligence (AI) | Software that performs tasks associated with human cognition. In practice today, software whose behavior is learned from data rather than hand-coded. |
| Machine learning (ML) | A subset of AI: systems that improve at a task by learning patterns from data instead of following written rules. |
| Deep learning (DL) | A subset of ML that uses neural networks with many layers, which learn their own features from raw data. |
| Generative AI | Models whose output is new content — text, images, audio, code — as opposed to a label or score. |
| Discriminative model | A model that outputs a label or number about an input (e.g. spam / not spam), the counterpart of a generative model. |
| Foundation model | A single large model pretrained on broad data, then adapted to many downstream tasks. LLMs are foundation models for text. |
| Supervised learning | Learning from labeled examples — inputs paired with correct outputs. |
| Unsupervised learning | Learning structure from unlabeled data, e.g. clustering. |
| Reinforcement learning (RL) | Learning by acting in an environment and receiving rewards; the basis of RLHF. |
| Self-supervised learning | Learning from unlabeled data that supplies its own labels (e.g. predict the next word). It made internet-scale pretraining possible. |
| Overfitting | When a model memorizes its training data, including noise, and fails to generalize to new inputs. |
| Generalization | A model’s performance on new, unseen data — the real goal of training. |
| Baseline | A trivial reference result (e.g. predict the most common class); a model’s score is only meaningful as a delta over it. |
| Term | Definition |
|---|
| Parameters (weights) | The adjustable numbers inside a model that training tunes; parameter count (“7B”, “70B”) is a rough proxy for capacity. |
| Neural network | A function built from layers of simple units (“neurons”); the basis of deep learning. |
| Transformer | The neural network architecture behind virtually all modern AI, built on self-attention and highly parallelizable. |
| Attention | The mechanism that lets a model weigh how much every position in a sequence should draw from every other. |
| Gradient descent | The optimization process that nudges parameters to reduce the loss, step by step. |
| Loss function | A single number measuring how wrong a model’s predictions are; training minimizes it. |
| Backpropagation | The algorithm that computes how much each parameter contributed to the loss, enabling the update. |
| Hyperparameter | A setting chosen before training (learning rate, batch size, layers), as opposed to a learned parameter. |
| Epoch / batch | A batch is a small group of examples processed together; an epoch is one full pass over the training data. |
| Quantization | Storing model weights at lower numerical precision (e.g. 4-bit) to cut memory, for a small quality cost. |
| Fine-tuning | Continuing training of a pretrained model on a small, task-specific dataset to adapt its behavior. |
| LoRA | Low-Rank Adaptation — a parameter-efficient fine-tuning method that trains tiny add-on adapters instead of the whole model. |
| Term | Definition |
|---|
| Large language model (LLM) | A foundation model trained to predict the next token of text; the engine behind chat, coding, and most generative text AI. |
| Token | A chunk of text (~¾ of a word) — the unit of billing, context limits, and latency. |
| Tokenization | Splitting text into tokens from a fixed vocabulary before the model processes it. |
| Context window | The maximum number of tokens a model can consider at once — input and output combined. |
| Embedding | A vector of numbers representing the meaning of content, positioned so similar meanings are close together. |
| Temperature | A decoding setting controlling randomness — 0 is near-deterministic, higher is more varied. |
| Top-p (nucleus sampling) | A decoding setting that restricts token choices to the smallest set covering probability p. |
| Pretraining | The large, expensive phase where a model learns general knowledge from a vast corpus, self-supervised. |
| Post-training | Turning a base model into an assistant via instruction tuning (SFT) and preference alignment (RLHF / DPO). |
| Hallucination | Fluent, confident, wrong output — intrinsic to LLMs, mitigated but never eliminated. |
| In-context learning | An LLM performing a task from examples given in the prompt, with no retraining — what makes prompting work. |
| Structured output | Model output constrained to a machine-readable schema (usually JSON), so code can safely consume it. |
| Inference | Running a trained model to produce output (as opposed to training it). |
| Term | Definition |
|---|
| Retrieval-augmented generation (RAG) | Fetching relevant information at request time and inserting it into the prompt, so the model answers from trusted data. |
| Vector database | A store optimized for finding items by similarity of their embedding vectors. |
| Semantic search | Search by meaning rather than keyword overlap, powered by embeddings. |
| Chunking | Splitting documents into passages small enough to embed and retrieve usefully. |
| Cosine similarity | A measure of how similar two vectors are by the angle between them; the default for text embeddings. |
| Approximate nearest neighbor (ANN) | Search that trades a little accuracy for a large speed gain when finding similar vectors at scale. |
| HNSW | A widely used ANN index structured as a navigable multi-layer graph. |
| Reranking | A second-stage model that re-scores retrieved candidates for relevance, keeping only the best few. |
| Hybrid search | Combining semantic (vector) and keyword (lexical) search and fusing the rankings. |
| Grounding | Tying a model’s answer to retrieved source material so claims are supported and verifiable. |
| Term | Definition |
|---|
| AI agent | An LLM placed in a loop with tools, allowed to decide its own next step until a task is done. |
| Agent loop / ReAct | The think → act → observe cycle an agent repeats; ReAct interleaves explicit reasoning with actions. |
| Tool use / function calling | The mechanism by which a model requests an action (a function with arguments) that your code then executes. |
| Prompt engineering | Designing the input that makes an LLM produce the desired output reliably. |
| System prompt | The instruction that sets a model’s durable role, rules, and default behavior, separate from the user’s message. |
| Chain-of-thought (CoT) | Prompting a model to reason step by step before answering, improving accuracy on multi-step problems. |
| Zero-shot / few-shot | Prompting with no examples (zero-shot) or with a few worked examples (few-shot). |
| Orchestration | The application code that decides the steps of an AI workflow — when to retrieve, call a tool, or loop. |
| Memory | An agent’s state: working memory (the current task, in the context window) and long-term memory (stored, retrieved across sessions). |
| Workflow vs. agent | A workflow has control flow defined by your code; an agent has control flow decided by the LLM at runtime. |
| Term | Definition |
|---|
| GPU / VRAM | The hardware that runs neural network math in parallel; VRAM (GPU memory) is usually the binding constraint. |
| Inference server | Software (e.g. vLLM) that serves a model efficiently with batching and KV-cache management. |
| KV cache | Stored intermediate values that speed up token-by-token generation, at the cost of GPU memory. |
| Throughput vs. latency | Throughput is requests served per second; latency is how fast one request completes — the two trade off. |
| Time to first token (TTFT) | How long before a model’s response starts streaming — a key interactive-latency metric. |
| MLOps / LLMOps | The practices for deploying, monitoring, and continuously improving ML models (MLOps) and LLM systems (LLMOps). |
| Model registry | Version control for trained models — tracking versions, metrics, stages, and enabling rollback. |
| Drift | The gradual divergence of live data from training data, which silently degrades model quality over time. |
| Evaluation set (eval) | A curated set of test cases with expected outcomes, used to measure quality and catch regressions. |
| LLM-as-judge | Using a strong LLM to score other models’ outputs against a rubric, for scalable evaluation. |
| Observability / tracing | Logging the full detail of each request — prompts, retrieved context, tool calls, cost, latency — to debug and monitor. |
| Streaming | Sending output tokens as they are generated, cutting perceived latency for interactive use. |
| Term | Definition |
|---|
| Prompt injection | An attack where untrusted text carries instructions the model follows; indirect injection hides them in retrieved or browsed content. |
| Jailbreak | Coaxing a model past its built-in safety training, distinct from injecting your application’s instructions. |
| Guardrails | Independent checks around a model — classifiers or code — that screen input and output for unsafe or off-policy content. |
| Least privilege | Granting tools and agents the minimum permissions needed, so a successful attack has a small blast radius. |
| PII | Personally identifiable information — data that must be handled, minimized, and often redacted before reaching a model. |
| Alignment | Training (e.g. via RLHF) that makes a model helpful, harmless, and honest by human standards. |
| Bias | Skewed or unfair model behavior learned from skewed training data; critical wherever AI decides about people. |
| Red-teaming | Actively attacking your own AI system before launch to find safety and security failures. |
This glossary distills the full guide. To go deeper on any term, start from the
relevant section — AI Fundamentals,
LLM Engineering, RAG, AI Agents,
or AI Safety & Security.