AI & ML terms glossary
ClientA compact English reference for high-intent search vocabulary—LLM, RAG, chunking, embeddings, evals, and product-side guardrails. Content is hand-curated; filter below or open popular long-tail keywords for other tools. Works offline once loaded—no generative call.
Why a static glossary?
AI documentation reuses a small set of moving parts—context budgets, retrieval, and safety at the app boundary. A single page with anchors beats dozens of single-term landing pages for users and for honest indexing.
Search & filter
Short English reference for common LLM and RAG vocabulary. Not legal, medical, or academic advice—verify for your release and policy.
?
Filter by category, or type a substring (e.g. rag, lora). Open a row’s link icon to copy a deep link with hash. URL ?q=… pre-fills the search from agent prefill.
Showing 60 entries.
LLM
Core conceptsAlso: large language model
A model trained to predict the next text token over broad corpora, then adapted for instruction-following. Often used for drafting, classifying, transforming, and reasoning over language when paired with the right context and tools.
Context window
Core conceptsAlso: max context length · sequence length limit
The maximum number of tokens a model can attend to in a single request (prompt + completion combined). Longer context helps with big documents, but also increases cost and latency.
Token
Core conceptsA unit the tokenizer produces from text. Token counts drive billing and context limits, but a token is not the same as a word—especially for non-Latin text or long technical identifiers.
Tokenizer
Core conceptsThe component that turns raw text into token ids (often subword, e.g. BPE or SentencePiece). Different services use different tokenizers, so per-token numbers are not automatically comparable across them.
Prompt
Core conceptsThe input you send to a model, often a mix of instructions, data, and examples. Most practical failures are from unclear goals, missing constraints, or unscoped tasks—not from missing clever wording.
System message / instruction
Core conceptsAlso: system prompt
A high-priority block that steers how the model should behave (role, style, and safety). It is not a substitute for your own access control, logging, and review when outputs matter.
Few-shot learning (in prompts)
Core conceptsAlso: in-context learning · multishot · k-shot
Supplying a few input/output examples in the prompt so the model can infer the target format. Helpful for structured outputs; for very long or sensitive examples, consider retrieval or tools instead of pasting everything.
Zero-shot
Core conceptsAsking the model to perform a task with instructions only—no training examples in the prompt. Relies on clear specs and, when needed, schema or validation on your side.
Embedding
Core conceptsAlso: vector embedding
A dense vector representation of text (or other inputs) in which closer vectors often mean similar meaning. Commonly used for semantic search, clustering, and retrieval—not for storage of secrets.
Hallucination
Core conceptsConfident but incorrect or unsupported statements. Mitigations include sourcing (retrieval), citations, tool verification, lower temperature for some tasks, and post-checks that match your domain requirements.
Grounding
Retrieval & contextAlso: grounded response
Anchoring a model’s answer in evidence you provide (for example, retrieved documents or user-supplied data) rather than only parametric knowledge, to reduce off-topic or invented details.
Knowledge cutoff
Core conceptsThe approximate time boundary of facts the model can reasonably recall from training. After that, facts may be outdated unless you supply fresh data via search, files, or tools.
Temperature
Inference & samplingA sampling control that flattens or sharpens the probability over next tokens. Higher values increase variety; lower values make outputs more deterministic. It does not replace validation logic.
Top-p (nucleus sampling)
Inference & samplingAlso: nucleus sampling
Samples from the smallest set of most likely tokens whose cumulative probability meets a threshold p. Often combined with temperature to tune creativity without entirely ignoring unlikely continuations.
Top-k sampling
Inference & samplingAt each step, only the k most likely next tokens are considered. Useful as a cap on surprise; with small k, outputs can be repetitive; very large k approaches broader sampling.
Max tokens (completion cap)
Inference & samplingAlso: output token limit
A hard cap on how many completion tokens a call may produce. Separate from the context window; setting it avoids runaway generations and unbounded cost.
Structured output
Inference & samplingAlso: JSON mode (concept) · schema-constrained text
Constrained or guided generation so the answer follows a pattern (e.g. JSON) suitable for your parser. Still validate with a schema check—models can emit plausible-looking but invalid structure.
RAG
Retrieval & contextAlso: retrieval-augmented generation
Retrieval-augmented generation: fetch relevant documents (or records), insert them as context, then ask the model to answer using that evidence. Strong default for up-to-date or private data without retraining a base model for every fact.
Vector database / vector store
Retrieval & contextA store for embeddings with similarity search, sometimes with metadata filters. Not magic—quality depends on chunking, refresh cadence, permissions, and evaluation of recall.
Chunking (for retrieval)
Retrieval & contextSplitting long text into pieces sized for embedding models and the context window. Overlap and boundaries matter: arbitrary splits can break tables, code, or legal clauses into nonsense spans.
Chunk overlap
Retrieval & contextReplicating a number of characters or tokens at chunk boundaries so sentences at edges still appear in full in at least one chunk. Trades more storage and compute for better recall of boundary content.
Reranking
Retrieval & contextAlso: re-ranking · re-ranker model
A second step after coarse retrieval: a model scores how well each candidate matches the question, to improve ordering—especially for ambiguous queries or when recall is good but precision is not.
MMR
Retrieval & contextAlso: maximal marginal relevance
A selection strategy that balances relevance with diversity, reducing near-duplicate chunks in the same context. Useful in top-k context packing for RAG and search results pages.
Hybrid search
Retrieval & contextCombining keyword methods (e.g. BM25) with vector similarity. Often best when queries contain rare tokens, skus, or exact product names that semantic search alone can miss.
BM25
Retrieval & contextA classic bag-of-words ranking function. Still competitive for many keyword-heavy queries; commonly paired with vector recall in hybrid systems.
Fine-tuning
Training & adaptationAlso: supervised fine-tuning · SFT
Training a pre-trained model further on a domain or task-labeled set to shift behavior, tone, or tool formats. Not always necessary—start with better prompts, retrieval, and evals, then add tuning when the gap is clear.
LoRA
Training & adaptationAlso: low-rank adaptation · PEFT-style adapter
Low-rank trainable updates applied to selected layers, reducing the memory footprint versus full weight updates. Often used in parameter-efficient fine-tuning (PEFT) stacks.
PEFT
Training & adaptationAlso: parameter-efficient fine-tuning
A family of methods (LoRA, adapters, and others) that update only a small set of extra parameters, keeping a frozen backbone. Useful for tight budgets and faster iteration.
RLHF
Training & adaptationAlso: reinforcement learning from human feedback
Aligning a model to human or AI preference labels using a reward model and policy optimization. Improves helpfulness and safety style but does not remove the need for content policies and product-side checks.
DPO
Training & adaptationAlso: direct preference optimization
A preference-tuning family that refines a policy from pairwise preferences without a separate reward model. Often used as an alternative or complement in alignment pipelines.
Distillation (knowledge distillation)
Training & adaptationTraining a smaller model to mimic a larger teacher’s behavior or logits. Cuts cost and latency, but the student inherits teacher blind spots if not retested in your use cases.
Quantization (model)
Training & adaptationStoring and computing weights in lower precision (e.g. int4/int8) to save memory and speed up inference. Can affect quality—benchmark on your tasks after deployment.
Benchmark (model)
Evaluation & qualityA public or internal suite that scores a model on a fixed set of tasks. Uplift on a leaderboard does not guarantee gains on your data—always add task-specific evals when risk matters.
Perplexity (language modeling)
Evaluation & qualityA standard intrinsic metric for how surprised a model is by held-out text (lower is better in distribution). It does not directly measure usefulness for downstream products.
LLM-as-a-judge
Evaluation & qualityUsing a model to score or compare outputs. Fast and cheap for iteration, but can share biases, length bias, and blind spots; pair with human review on critical changes.
Golden set / eval set
Evaluation & qualityA curated set of input questions with reference answers or rubrics, used to track regressions when prompts, models, or RAG data change. Version it like product code.
Guardrails (product)
Safety & misuseProduct-level checks around the model: rate limits, allowlists, blocklists, PII redaction, tool permissions, and human review. Not the same as a single “safety” flag on the API call.
Prompt injection
Safety & misuseTricking the system into following attacker-controlled instructions, often by planting text in data the model is supposed to use. Defend with privilege separation, tool policies, and never trusting model output to bypass auth.
Jailbreak (prompting)
Safety & misuseA prompt or template intended to elicit disallowed or unsafe behavior. Treat as an abuse vector in public endpoints; do not document step-by-step recipes in user-facing help.
Alignment (AI safety, informal)
Safety & misuseMaking model behavior match operator intent and public policy—helpful, honest, and within rules. A research and product discipline, not a single on/off feature.
PII
Safety & misuseAlso: personally identifiable information
Identifying data about a person. Before sending text to any remote model, follow your retention, consent, and minimization policy; local redaction and scanning can reduce risk.
Agent (LLM-based)
Agents & orchestrationA system where a model plans steps, may call tools, and loops until a stop condition. Reliability depends on tool contracts, idempotency, and observability—not only model cleverness.
Tool use / function calling (concept)
Agents & orchestrationA structured pattern: the model emits a tool name and arguments; your code executes and returns results. The host must validate inputs, enforce auth, and handle failure modes.
Model Context Protocol (MCP)
Inputs & structureA wire protocol pattern for agents to list and call tools, resources, and prompts from a host. It standardizes how capabilities are exposed—not a replacement for your security review of each tool action.
Orchestration (workflows)
Agents & orchestrationComposing multiple steps, models, or services into a pipeline with branching, retries, and human handoff. The hard parts are state, idempotency, and logging—not only prompt quality.
ReAct-style loop
Agents & orchestrationAlso: ReAct (reason+act pattern)
A pattern that alternates short reasoning with actions (e.g. tool calls) in a loop. Good for research-style tasks; still needs max-step limits and safety checks to avoid runaway loops.
Multimodal model
Core conceptsA model that conditions on more than one modality (e.g. text and images) in a shared architecture. Usefulness depends on task fit—always verify for your domain, especially charts and code screenshots.
Chain-of-thought (CoT)
Inference & samplingAsking the model to show intermediate reasoning before an answer, which can improve complex math or logic. For production, you may elide the trace from the user while still logging for audit.
JSON Lines
Inputs & structureAlso: ndjson in ML exports
One JSON object per line—common in dataset exports and some eval pipelines. Easy to stream; validate each line with a JSON parser, not with regex for arbitrary payloads.
Prompt engineering
Core conceptsThe practice of designing, testing, and versioning instructions and few-shot sets for a task. Mature teams treat prompt changes like code: review, measure, and roll back on regressions.
Logprobs (log-probabilities)
Inference & samplingPer-token log probabilities from the model. Useful for uncertainty hints, top-token inspection, and some evaluation workflows—if your provider exposes them in your plan.
Beam search
Inference & samplingDecoding that keeps multiple candidate continuations in parallel, common in some seq2seq and translation systems. Many chat products default to single-sample decoding; beam search is less universal in LLM UIs but still appears in research tooling.
Synthetic data (for training)
Training & adaptationData generated by models, templates, or simulators, sometimes mixed with real labels. Can bootstrap tasks but risks distribution shift—always validate on real traffic slices.
Long-context models
Core conceptsModels marketed for very large context windows. Even when supported, you still pay in latency, cost, and “lost in the middle” effects; chunking and retrieval often remain the right design.
Attention (transformer, informal)
Core conceptsA mechanism in transformer blocks that lets tokens relate to one another at different positions. The user-facing limit you feel is the context window and service quotas—not the number of self-attention heads you configure.
Latent / representation (informal)
Core conceptsThe internal high-dimensional state learned by a model, often glossed as “where meaning lives.” Vague in conversation; for products, focus on task metrics and retrieval quality instead.
Deduplication (datasets, retrieval)
Retrieval & contextRemoving near-duplicate documents or lines so evals and indexes are not overfitted to repeated boilerplate. Critical before embedding or training on scraped corpora.
Citation (in RAG answers)
Retrieval & contextPointing each claim to a source span or id from your corpus. Improves trust and debugging, but the model can still misattribute—verify in UI when stakes are high.
Latency (inference)
Inference & samplingTime to first token and time to last token, plus your own overhead. A primary UX constraint; architectural wins often beat small prompt cleverness in interactive apps.
Batch / offline scoring
Inference & samplingRunning many independent prompts in bulk—often cheaper and higher throughput, but with slower wall-clock for each item. Common for backfills, labeling, and ETL, not for live chat turn-taking.
Common use cases
- Skim definitions before reading RAG or agent docs so acronyms like MMR, BM25, or PEFT do not slow you down.
- Share a deep link to a single term with your team (Copy link) when reviewing architecture or runbooks—no sign-in.
- Pair with the token estimate, RAG chunk calculator, and prompt checklist when you are building context budgets and retrieval plans.
Common mistakes to avoid
Treating this page as a compliance or legal source
These are short orienting definitions. When policies, contracts, or safety certifications apply, use your org’s official guidance.
Assuming all providers use the same tokenizer
Token counts, pricing, and limits are service-specific. Use each vendor’s own meters when the bill matters.
FAQ
Does this glossary call a generative model?
No. The text is static. Search and filter run locally in your browser.
How do I open a specific term from a link?
Use Copy link on a row, or append a hash: #glossary-rag, #glossary-embedding, etc. The page scrolls to the card when the hash matches.
Will you add more terms?
Yes—this list is maintained with the rest of the AI hub. Suggest related tools via the “More tools” block below and the /tools/keywords index for long-tail phrases across the site.
Common search terms
Phrases people search for that match this tool. See the full long-tail keyword index.
- llm glossary online
- what is rag retrieval augmented generation
- embedding vector meaning llm
- context window explained
- ai ml terms reference in browser
- tokenizer and token definition
- fine tuning vs lora explained
- hallucination llm meaning
More tools
Related utilities you can open in another tab—mostly client-side.
Prompt structure checklist
ClientHeuristic checklist for LLM prompts—role, task, output format, constraints, examples—pattern-based in the browser, no API.
LLM token estimate
ClientRough character-based token planning for prompts and context—CJK-aware heuristic, browser-only—not tokenizer-exact.
LLM context split & budget
ClientSplit pasted prompt blocks by a delimiter and see per-section rough token share and optional budget compare—browser-only, not tokenizer-exact.
RAG chunk calculator
ClientSliding-window chunk count from document length, chunk size, and overlap—plan embedding batches without sending text to a server.