AI & ML terms glossary

Client

A compact English reference for high-intent search vocabulary—LLM, RAG, chunking, embeddings, evals, and product-side guardrails. Content is hand-curated; filter below or open popular long-tail keywords for other tools. Works offline once loaded—no generative call.

Why a static glossary?

AI documentation reuses a small set of moving parts—context budgets, retrieval, and safety at the app boundary. A single page with anchors beats dozens of single-term landing pages for users and for honest indexing.

Search & filter

Short English reference for common LLM and RAG vocabulary. Not legal, medical, or academic advice—verify for your release and policy.

Filter by category, or type a substring (e.g. rag, lora). Open a row’s link icon to copy a deep link with hash. URL ?q=… pre-fills the search from agent prefill.

Category

Showing 60 entries.

LLM
Core concepts
Also: large language model
A model trained to predict the next text token over broad corpora, then adapted for instruction-following. Often used for drafting, classifying, transforming, and reasoning over language when paired with the right context and tools.
Context window
Core concepts
Also: max context length · sequence length limit
The maximum number of tokens a model can attend to in a single request (prompt + completion combined). Longer context helps with big documents, but also increases cost and latency.
Token
Core concepts
A unit the tokenizer produces from text. Token counts drive billing and context limits, but a token is not the same as a word—especially for non-Latin text or long technical identifiers.
Tokenizer
Core concepts
The component that turns raw text into token ids (often subword, e.g. BPE or SentencePiece). Different services use different tokenizers, so per-token numbers are not automatically comparable across them.
Prompt
Core concepts
The input you send to a model, often a mix of instructions, data, and examples. Most practical failures are from unclear goals, missing constraints, or unscoped tasks—not from missing clever wording.
System message / instruction
Core concepts
Also: system prompt
A high-priority block that steers how the model should behave (role, style, and safety). It is not a substitute for your own access control, logging, and review when outputs matter.
Few-shot learning (in prompts)
Core concepts
Also: in-context learning · multishot · k-shot
Supplying a few input/output examples in the prompt so the model can infer the target format. Helpful for structured outputs; for very long or sensitive examples, consider retrieval or tools instead of pasting everything.
Zero-shot
Core concepts
Asking the model to perform a task with instructions only—no training examples in the prompt. Relies on clear specs and, when needed, schema or validation on your side.
Embedding
Core concepts
Also: vector embedding
A dense vector representation of text (or other inputs) in which closer vectors often mean similar meaning. Commonly used for semantic search, clustering, and retrieval—not for storage of secrets.
Hallucination
Core concepts
Confident but incorrect or unsupported statements. Mitigations include sourcing (retrieval), citations, tool verification, lower temperature for some tasks, and post-checks that match your domain requirements.
Grounding
Retrieval & context
Also: grounded response
Anchoring a model’s answer in evidence you provide (for example, retrieved documents or user-supplied data) rather than only parametric knowledge, to reduce off-topic or invented details.
Knowledge cutoff
Core concepts
The approximate time boundary of facts the model can reasonably recall from training. After that, facts may be outdated unless you supply fresh data via search, files, or tools.
Temperature
Inference & sampling
A sampling control that flattens or sharpens the probability over next tokens. Higher values increase variety; lower values make outputs more deterministic. It does not replace validation logic.
Top-p (nucleus sampling)
Inference & sampling
Also: nucleus sampling
Samples from the smallest set of most likely tokens whose cumulative probability meets a threshold p. Often combined with temperature to tune creativity without entirely ignoring unlikely continuations.
Top-k sampling
Inference & sampling
At each step, only the k most likely next tokens are considered. Useful as a cap on surprise; with small k, outputs can be repetitive; very large k approaches broader sampling.
Max tokens (completion cap)
Inference & sampling
Also: output token limit
A hard cap on how many completion tokens a call may produce. Separate from the context window; setting it avoids runaway generations and unbounded cost.
Structured output
Inference & sampling
Also: JSON mode (concept) · schema-constrained text
Constrained or guided generation so the answer follows a pattern (e.g. JSON) suitable for your parser. Still validate with a schema check—models can emit plausible-looking but invalid structure.
RAG
Retrieval & context
Also: retrieval-augmented generation
Retrieval-augmented generation: fetch relevant documents (or records), insert them as context, then ask the model to answer using that evidence. Strong default for up-to-date or private data without retraining a base model for every fact.
Vector database / vector store
Retrieval & context
A store for embeddings with similarity search, sometimes with metadata filters. Not magic—quality depends on chunking, refresh cadence, permissions, and evaluation of recall.
Chunking (for retrieval)
Retrieval & context
Splitting long text into pieces sized for embedding models and the context window. Overlap and boundaries matter: arbitrary splits can break tables, code, or legal clauses into nonsense spans.
Chunk overlap
Retrieval & context
Replicating a number of characters or tokens at chunk boundaries so sentences at edges still appear in full in at least one chunk. Trades more storage and compute for better recall of boundary content.
Reranking
Retrieval & context
Also: re-ranking · re-ranker model
A second step after coarse retrieval: a model scores how well each candidate matches the question, to improve ordering—especially for ambiguous queries or when recall is good but precision is not.
MMR
Retrieval & context
Also: maximal marginal relevance
A selection strategy that balances relevance with diversity, reducing near-duplicate chunks in the same context. Useful in top-k context packing for RAG and search results pages.
Hybrid search
Retrieval & context
Combining keyword methods (e.g. BM25) with vector similarity. Often best when queries contain rare tokens, skus, or exact product names that semantic search alone can miss.
BM25
Retrieval & context
A classic bag-of-words ranking function. Still competitive for many keyword-heavy queries; commonly paired with vector recall in hybrid systems.
Fine-tuning
Training & adaptation
Also: supervised fine-tuning · SFT
Training a pre-trained model further on a domain or task-labeled set to shift behavior, tone, or tool formats. Not always necessary—start with better prompts, retrieval, and evals, then add tuning when the gap is clear.
LoRA
Training & adaptation
Also: low-rank adaptation · PEFT-style adapter
Low-rank trainable updates applied to selected layers, reducing the memory footprint versus full weight updates. Often used in parameter-efficient fine-tuning (PEFT) stacks.
PEFT
Training & adaptation
Also: parameter-efficient fine-tuning
A family of methods (LoRA, adapters, and others) that update only a small set of extra parameters, keeping a frozen backbone. Useful for tight budgets and faster iteration.
RLHF
Training & adaptation
Also: reinforcement learning from human feedback
Aligning a model to human or AI preference labels using a reward model and policy optimization. Improves helpfulness and safety style but does not remove the need for content policies and product-side checks.
DPO
Training & adaptation
Also: direct preference optimization
A preference-tuning family that refines a policy from pairwise preferences without a separate reward model. Often used as an alternative or complement in alignment pipelines.
Distillation (knowledge distillation)
Training & adaptation
Training a smaller model to mimic a larger teacher’s behavior or logits. Cuts cost and latency, but the student inherits teacher blind spots if not retested in your use cases.
Quantization (model)
Training & adaptation
Storing and computing weights in lower precision (e.g. int4/int8) to save memory and speed up inference. Can affect quality—benchmark on your tasks after deployment.
Benchmark (model)
Evaluation & quality
A public or internal suite that scores a model on a fixed set of tasks. Uplift on a leaderboard does not guarantee gains on your data—always add task-specific evals when risk matters.
Perplexity (language modeling)
Evaluation & quality
A standard intrinsic metric for how surprised a model is by held-out text (lower is better in distribution). It does not directly measure usefulness for downstream products.
LLM-as-a-judge
Evaluation & quality
Using a model to score or compare outputs. Fast and cheap for iteration, but can share biases, length bias, and blind spots; pair with human review on critical changes.
Golden set / eval set
Evaluation & quality
A curated set of input questions with reference answers or rubrics, used to track regressions when prompts, models, or RAG data change. Version it like product code.
Guardrails (product)
Safety & misuse
Product-level checks around the model: rate limits, allowlists, blocklists, PII redaction, tool permissions, and human review. Not the same as a single “safety” flag on the API call.
Prompt injection
Safety & misuse
Tricking the system into following attacker-controlled instructions, often by planting text in data the model is supposed to use. Defend with privilege separation, tool policies, and never trusting model output to bypass auth.
Jailbreak (prompting)
Safety & misuse
A prompt or template intended to elicit disallowed or unsafe behavior. Treat as an abuse vector in public endpoints; do not document step-by-step recipes in user-facing help.
Alignment (AI safety, informal)
Safety & misuse
Making model behavior match operator intent and public policy—helpful, honest, and within rules. A research and product discipline, not a single on/off feature.
PII
Safety & misuse
Also: personally identifiable information
Identifying data about a person. Before sending text to any remote model, follow your retention, consent, and minimization policy; local redaction and scanning can reduce risk.
Agent (LLM-based)
Agents & orchestration
A system where a model plans steps, may call tools, and loops until a stop condition. Reliability depends on tool contracts, idempotency, and observability—not only model cleverness.
Tool use / function calling (concept)
Agents & orchestration
A structured pattern: the model emits a tool name and arguments; your code executes and returns results. The host must validate inputs, enforce auth, and handle failure modes.
Model Context Protocol (MCP)
Inputs & structure
A wire protocol pattern for agents to list and call tools, resources, and prompts from a host. It standardizes how capabilities are exposed—not a replacement for your security review of each tool action.
Orchestration (workflows)
Agents & orchestration
Composing multiple steps, models, or services into a pipeline with branching, retries, and human handoff. The hard parts are state, idempotency, and logging—not only prompt quality.
ReAct-style loop
Agents & orchestration
Also: ReAct (reason+act pattern)
A pattern that alternates short reasoning with actions (e.g. tool calls) in a loop. Good for research-style tasks; still needs max-step limits and safety checks to avoid runaway loops.
Multimodal model
Core concepts
A model that conditions on more than one modality (e.g. text and images) in a shared architecture. Usefulness depends on task fit—always verify for your domain, especially charts and code screenshots.
Chain-of-thought (CoT)
Inference & sampling
Asking the model to show intermediate reasoning before an answer, which can improve complex math or logic. For production, you may elide the trace from the user while still logging for audit.
JSON Lines
Inputs & structure
Also: ndjson in ML exports
One JSON object per line—common in dataset exports and some eval pipelines. Easy to stream; validate each line with a JSON parser, not with regex for arbitrary payloads.
Prompt engineering
Core concepts
The practice of designing, testing, and versioning instructions and few-shot sets for a task. Mature teams treat prompt changes like code: review, measure, and roll back on regressions.
Logprobs (log-probabilities)
Inference & sampling
Per-token log probabilities from the model. Useful for uncertainty hints, top-token inspection, and some evaluation workflows—if your provider exposes them in your plan.
Beam search
Inference & sampling
Decoding that keeps multiple candidate continuations in parallel, common in some seq2seq and translation systems. Many chat products default to single-sample decoding; beam search is less universal in LLM UIs but still appears in research tooling.
Synthetic data (for training)
Training & adaptation
Data generated by models, templates, or simulators, sometimes mixed with real labels. Can bootstrap tasks but risks distribution shift—always validate on real traffic slices.
Long-context models
Core concepts
Models marketed for very large context windows. Even when supported, you still pay in latency, cost, and “lost in the middle” effects; chunking and retrieval often remain the right design.
Attention (transformer, informal)
Core concepts
A mechanism in transformer blocks that lets tokens relate to one another at different positions. The user-facing limit you feel is the context window and service quotas—not the number of self-attention heads you configure.
Latent / representation (informal)
Core concepts
The internal high-dimensional state learned by a model, often glossed as “where meaning lives.” Vague in conversation; for products, focus on task metrics and retrieval quality instead.
Deduplication (datasets, retrieval)
Retrieval & context
Removing near-duplicate documents or lines so evals and indexes are not overfitted to repeated boilerplate. Critical before embedding or training on scraped corpora.
Citation (in RAG answers)
Retrieval & context
Pointing each claim to a source span or id from your corpus. Improves trust and debugging, but the model can still misattribute—verify in UI when stakes are high.
Latency (inference)
Inference & sampling
Time to first token and time to last token, plus your own overhead. A primary UX constraint; architectural wins often beat small prompt cleverness in interactive apps.
Batch / offline scoring
Inference & sampling
Running many independent prompts in bulk—often cheaper and higher throughput, but with slower wall-clock for each item. Common for backfills, labeling, and ETL, not for live chat turn-taking.

Common use cases

Skim definitions before reading RAG or agent docs so acronyms like MMR, BM25, or PEFT do not slow you down.
Share a deep link to a single term with your team (Copy link) when reviewing architecture or runbooks—no sign-in.
Pair with the token estimate, RAG chunk calculator, and prompt checklist when you are building context budgets and retrieval plans.

Common mistakes to avoid

Treating this page as a compliance or legal source
These are short orienting definitions. When policies, contracts, or safety certifications apply, use your org’s official guidance.
Assuming all providers use the same tokenizer
Token counts, pricing, and limits are service-specific. Use each vendor’s own meters when the bill matters.

FAQ

Does this glossary call a generative model?

No. The text is static. Search and filter run locally in your browser.

How do I open a specific term from a link?

Use Copy link on a row, or append a hash: #glossary-rag, #glossary-embedding, etc. The page scrolls to the card when the hash matches.

Will you add more terms?

Yes—this list is maintained with the rest of the AI hub. Suggest related tools via the “More tools” block below and the /tools/keywords index for long-tail phrases across the site.

Common search terms

Phrases people search for that match this tool. See the full long-tail keyword index.

llm glossary online
what is rag retrieval augmented generation
embedding vector meaning llm
context window explained
ai ml terms reference in browser
tokenizer and token definition
fine tuning vs lora explained
hallucination llm meaning

Related utilities you can open in another tab—mostly client-side.

AI & ML terms glossary

Why a static glossary?

Search & filter

LLM

Context window

Token

Tokenizer

Prompt

System message / instruction

Few-shot learning (in prompts)

Zero-shot

Embedding

Hallucination

Grounding

Knowledge cutoff

Temperature

Top-p (nucleus sampling)

Top-k sampling

Max tokens (completion cap)

Structured output

RAG

Vector database / vector store

Chunking (for retrieval)

Chunk overlap

Reranking

MMR

Hybrid search

BM25

Fine-tuning

LoRA

PEFT

RLHF

DPO

Distillation (knowledge distillation)

Quantization (model)

Benchmark (model)

Perplexity (language modeling)

LLM-as-a-judge

Golden set / eval set

Guardrails (product)

Prompt injection

Jailbreak (prompting)

Alignment (AI safety, informal)

PII

Agent (LLM-based)

Tool use / function calling (concept)

Model Context Protocol (MCP)

Orchestration (workflows)

ReAct-style loop

Multimodal model

Chain-of-thought (CoT)

JSON Lines

Prompt engineering

Logprobs (log-probabilities)

Beam search

Synthetic data (for training)

Long-context models

Attention (transformer, informal)

Latent / representation (informal)

Deduplication (datasets, retrieval)

Citation (in RAG answers)

Latency (inference)

Batch / offline scoring

Common use cases

Common mistakes to avoid

FAQ

Common search terms

More tools

Prompt structure checklist

LLM token estimate

LLM context split & budget

RAG chunk calculator