AI & ML terms glossary

Client

A compact English reference for high-intent search vocabulary—LLM, RAG, chunking, embeddings, evals, and product-side guardrails. Content is hand-curated; filter below or open popular long-tail keywords for other tools. Works offline once loaded—no generative call.

Why a static glossary?

AI documentation reuses a small set of moving parts—context budgets, retrieval, and safety at the app boundary. A single page with anchors beats dozens of single-term landing pages for users and for honest indexing.

Search & filter

Short English reference for common LLM and RAG vocabulary. Not legal, medical, or academic advice—verify for your release and policy.

?

Filter by category, or type a substring (e.g. rag, lora). Open a row’s link icon to copy a deep link with hash. URL ?q=… pre-fills the search from agent prefill.

Showing 60 entries.

  • LLM

    Core concepts

    Also: large language model

    A model trained to predict the next text token over broad corpora, then adapted for instruction-following. Often used for drafting, classifying, transforming, and reasoning over language when paired with the right context and tools.

  • Context window

    Core concepts

    Also: max context length · sequence length limit

    The maximum number of tokens a model can attend to in a single request (prompt + completion combined). Longer context helps with big documents, but also increases cost and latency.

  • Token

    Core concepts

    A unit the tokenizer produces from text. Token counts drive billing and context limits, but a token is not the same as a word—especially for non-Latin text or long technical identifiers.

  • Tokenizer

    Core concepts

    The component that turns raw text into token ids (often subword, e.g. BPE or SentencePiece). Different services use different tokenizers, so per-token numbers are not automatically comparable across them.

  • Prompt

    Core concepts

    The input you send to a model, often a mix of instructions, data, and examples. Most practical failures are from unclear goals, missing constraints, or unscoped tasks—not from missing clever wording.

  • System message / instruction

    Core concepts

    Also: system prompt

    A high-priority block that steers how the model should behave (role, style, and safety). It is not a substitute for your own access control, logging, and review when outputs matter.

  • Few-shot learning (in prompts)

    Core concepts

    Also: in-context learning · multishot · k-shot

    Supplying a few input/output examples in the prompt so the model can infer the target format. Helpful for structured outputs; for very long or sensitive examples, consider retrieval or tools instead of pasting everything.

  • Zero-shot

    Core concepts

    Asking the model to perform a task with instructions only—no training examples in the prompt. Relies on clear specs and, when needed, schema or validation on your side.

  • Embedding

    Core concepts

    Also: vector embedding

    A dense vector representation of text (or other inputs) in which closer vectors often mean similar meaning. Commonly used for semantic search, clustering, and retrieval—not for storage of secrets.

  • Hallucination

    Core concepts

    Confident but incorrect or unsupported statements. Mitigations include sourcing (retrieval), citations, tool verification, lower temperature for some tasks, and post-checks that match your domain requirements.

  • Grounding

    Retrieval & context

    Also: grounded response

    Anchoring a model’s answer in evidence you provide (for example, retrieved documents or user-supplied data) rather than only parametric knowledge, to reduce off-topic or invented details.

  • Knowledge cutoff

    Core concepts

    The approximate time boundary of facts the model can reasonably recall from training. After that, facts may be outdated unless you supply fresh data via search, files, or tools.

  • Temperature

    Inference & sampling

    A sampling control that flattens or sharpens the probability over next tokens. Higher values increase variety; lower values make outputs more deterministic. It does not replace validation logic.

  • Top-p (nucleus sampling)

    Inference & sampling

    Also: nucleus sampling

    Samples from the smallest set of most likely tokens whose cumulative probability meets a threshold p. Often combined with temperature to tune creativity without entirely ignoring unlikely continuations.

  • Top-k sampling

    Inference & sampling

    At each step, only the k most likely next tokens are considered. Useful as a cap on surprise; with small k, outputs can be repetitive; very large k approaches broader sampling.

  • Max tokens (completion cap)

    Inference & sampling

    Also: output token limit

    A hard cap on how many completion tokens a call may produce. Separate from the context window; setting it avoids runaway generations and unbounded cost.

  • Structured output

    Inference & sampling

    Also: JSON mode (concept) · schema-constrained text

    Constrained or guided generation so the answer follows a pattern (e.g. JSON) suitable for your parser. Still validate with a schema check—models can emit plausible-looking but invalid structure.

  • RAG

    Retrieval & context

    Also: retrieval-augmented generation

    Retrieval-augmented generation: fetch relevant documents (or records), insert them as context, then ask the model to answer using that evidence. Strong default for up-to-date or private data without retraining a base model for every fact.

  • Vector database / vector store

    Retrieval & context

    A store for embeddings with similarity search, sometimes with metadata filters. Not magic—quality depends on chunking, refresh cadence, permissions, and evaluation of recall.

  • Chunking (for retrieval)

    Retrieval & context

    Splitting long text into pieces sized for embedding models and the context window. Overlap and boundaries matter: arbitrary splits can break tables, code, or legal clauses into nonsense spans.

  • Chunk overlap

    Retrieval & context

    Replicating a number of characters or tokens at chunk boundaries so sentences at edges still appear in full in at least one chunk. Trades more storage and compute for better recall of boundary content.

  • Reranking

    Retrieval & context

    Also: re-ranking · re-ranker model

    A second step after coarse retrieval: a model scores how well each candidate matches the question, to improve ordering—especially for ambiguous queries or when recall is good but precision is not.

  • MMR

    Retrieval & context

    Also: maximal marginal relevance

    A selection strategy that balances relevance with diversity, reducing near-duplicate chunks in the same context. Useful in top-k context packing for RAG and search results pages.

  • BM25

    Retrieval & context

    A classic bag-of-words ranking function. Still competitive for many keyword-heavy queries; commonly paired with vector recall in hybrid systems.

  • Fine-tuning

    Training & adaptation

    Also: supervised fine-tuning · SFT

    Training a pre-trained model further on a domain or task-labeled set to shift behavior, tone, or tool formats. Not always necessary—start with better prompts, retrieval, and evals, then add tuning when the gap is clear.

  • LoRA

    Training & adaptation

    Also: low-rank adaptation · PEFT-style adapter

    Low-rank trainable updates applied to selected layers, reducing the memory footprint versus full weight updates. Often used in parameter-efficient fine-tuning (PEFT) stacks.

  • PEFT

    Training & adaptation

    Also: parameter-efficient fine-tuning

    A family of methods (LoRA, adapters, and others) that update only a small set of extra parameters, keeping a frozen backbone. Useful for tight budgets and faster iteration.

  • RLHF

    Training & adaptation

    Also: reinforcement learning from human feedback

    Aligning a model to human or AI preference labels using a reward model and policy optimization. Improves helpfulness and safety style but does not remove the need for content policies and product-side checks.

  • DPO

    Training & adaptation

    Also: direct preference optimization

    A preference-tuning family that refines a policy from pairwise preferences without a separate reward model. Often used as an alternative or complement in alignment pipelines.

  • Distillation (knowledge distillation)

    Training & adaptation

    Training a smaller model to mimic a larger teacher’s behavior or logits. Cuts cost and latency, but the student inherits teacher blind spots if not retested in your use cases.

  • Quantization (model)

    Training & adaptation

    Storing and computing weights in lower precision (e.g. int4/int8) to save memory and speed up inference. Can affect quality—benchmark on your tasks after deployment.

  • Benchmark (model)

    Evaluation & quality

    A public or internal suite that scores a model on a fixed set of tasks. Uplift on a leaderboard does not guarantee gains on your data—always add task-specific evals when risk matters.

  • Perplexity (language modeling)

    Evaluation & quality

    A standard intrinsic metric for how surprised a model is by held-out text (lower is better in distribution). It does not directly measure usefulness for downstream products.

  • LLM-as-a-judge

    Evaluation & quality

    Using a model to score or compare outputs. Fast and cheap for iteration, but can share biases, length bias, and blind spots; pair with human review on critical changes.

  • Golden set / eval set

    Evaluation & quality

    A curated set of input questions with reference answers or rubrics, used to track regressions when prompts, models, or RAG data change. Version it like product code.

  • Guardrails (product)

    Safety & misuse

    Product-level checks around the model: rate limits, allowlists, blocklists, PII redaction, tool permissions, and human review. Not the same as a single “safety” flag on the API call.

  • Prompt injection

    Safety & misuse

    Tricking the system into following attacker-controlled instructions, often by planting text in data the model is supposed to use. Defend with privilege separation, tool policies, and never trusting model output to bypass auth.

  • Jailbreak (prompting)

    Safety & misuse

    A prompt or template intended to elicit disallowed or unsafe behavior. Treat as an abuse vector in public endpoints; do not document step-by-step recipes in user-facing help.

  • Alignment (AI safety, informal)

    Safety & misuse

    Making model behavior match operator intent and public policy—helpful, honest, and within rules. A research and product discipline, not a single on/off feature.

  • PII

    Safety & misuse

    Also: personally identifiable information

    Identifying data about a person. Before sending text to any remote model, follow your retention, consent, and minimization policy; local redaction and scanning can reduce risk.

  • Agent (LLM-based)

    Agents & orchestration

    A system where a model plans steps, may call tools, and loops until a stop condition. Reliability depends on tool contracts, idempotency, and observability—not only model cleverness.

  • Tool use / function calling (concept)

    Agents & orchestration

    A structured pattern: the model emits a tool name and arguments; your code executes and returns results. The host must validate inputs, enforce auth, and handle failure modes.

  • Model Context Protocol (MCP)

    Inputs & structure

    A wire protocol pattern for agents to list and call tools, resources, and prompts from a host. It standardizes how capabilities are exposed—not a replacement for your security review of each tool action.

  • Orchestration (workflows)

    Agents & orchestration

    Composing multiple steps, models, or services into a pipeline with branching, retries, and human handoff. The hard parts are state, idempotency, and logging—not only prompt quality.

  • ReAct-style loop

    Agents & orchestration

    Also: ReAct (reason+act pattern)

    A pattern that alternates short reasoning with actions (e.g. tool calls) in a loop. Good for research-style tasks; still needs max-step limits and safety checks to avoid runaway loops.

  • Multimodal model

    Core concepts

    A model that conditions on more than one modality (e.g. text and images) in a shared architecture. Usefulness depends on task fit—always verify for your domain, especially charts and code screenshots.

  • Chain-of-thought (CoT)

    Inference & sampling

    Asking the model to show intermediate reasoning before an answer, which can improve complex math or logic. For production, you may elide the trace from the user while still logging for audit.

  • JSON Lines

    Inputs & structure

    Also: ndjson in ML exports

    One JSON object per line—common in dataset exports and some eval pipelines. Easy to stream; validate each line with a JSON parser, not with regex for arbitrary payloads.

  • Prompt engineering

    Core concepts

    The practice of designing, testing, and versioning instructions and few-shot sets for a task. Mature teams treat prompt changes like code: review, measure, and roll back on regressions.

  • Logprobs (log-probabilities)

    Inference & sampling

    Per-token log probabilities from the model. Useful for uncertainty hints, top-token inspection, and some evaluation workflows—if your provider exposes them in your plan.

  • Synthetic data (for training)

    Training & adaptation

    Data generated by models, templates, or simulators, sometimes mixed with real labels. Can bootstrap tasks but risks distribution shift—always validate on real traffic slices.

  • Long-context models

    Core concepts

    Models marketed for very large context windows. Even when supported, you still pay in latency, cost, and “lost in the middle” effects; chunking and retrieval often remain the right design.

  • Attention (transformer, informal)

    Core concepts

    A mechanism in transformer blocks that lets tokens relate to one another at different positions. The user-facing limit you feel is the context window and service quotas—not the number of self-attention heads you configure.

  • Latent / representation (informal)

    Core concepts

    The internal high-dimensional state learned by a model, often glossed as “where meaning lives.” Vague in conversation; for products, focus on task metrics and retrieval quality instead.

  • Deduplication (datasets, retrieval)

    Retrieval & context

    Removing near-duplicate documents or lines so evals and indexes are not overfitted to repeated boilerplate. Critical before embedding or training on scraped corpora.

  • Citation (in RAG answers)

    Retrieval & context

    Pointing each claim to a source span or id from your corpus. Improves trust and debugging, but the model can still misattribute—verify in UI when stakes are high.

  • Latency (inference)

    Inference & sampling

    Time to first token and time to last token, plus your own overhead. A primary UX constraint; architectural wins often beat small prompt cleverness in interactive apps.

  • Batch / offline scoring

    Inference & sampling

    Running many independent prompts in bulk—often cheaper and higher throughput, but with slower wall-clock for each item. Common for backfills, labeling, and ETL, not for live chat turn-taking.

Common use cases

  • Skim definitions before reading RAG or agent docs so acronyms like MMR, BM25, or PEFT do not slow you down.
  • Share a deep link to a single term with your team (Copy link) when reviewing architecture or runbooks—no sign-in.
  • Pair with the token estimate, RAG chunk calculator, and prompt checklist when you are building context budgets and retrieval plans.

Common mistakes to avoid

  • Treating this page as a compliance or legal source

    These are short orienting definitions. When policies, contracts, or safety certifications apply, use your org’s official guidance.

  • Assuming all providers use the same tokenizer

    Token counts, pricing, and limits are service-specific. Use each vendor’s own meters when the bill matters.

FAQ

Does this glossary call a generative model?

No. The text is static. Search and filter run locally in your browser.

How do I open a specific term from a link?

Use Copy link on a row, or append a hash: #glossary-rag, #glossary-embedding, etc. The page scrolls to the card when the hash matches.

Will you add more terms?

Yes—this list is maintained with the rest of the AI hub. Suggest related tools via the “More tools” block below and the /tools/keywords index for long-tail phrases across the site.

Common search terms

Phrases people search for that match this tool. See the full long-tail keyword index.

  • llm glossary online
  • what is rag retrieval augmented generation
  • embedding vector meaning llm
  • context window explained
  • ai ml terms reference in browser
  • tokenizer and token definition
  • fine tuning vs lora explained
  • hallucination llm meaning

Related utilities you can open in another tab—mostly client-side.