LLM token estimate

Client

A rough, offline-friendly token budget from pasted text. Use it alongside word count and your provider’s official tokenizer when accuracy matters.

Context planning without calling an API

Large language models charge and truncate by tokens, not characters. This tool does not call any model: it estimates tokens by dividing character count by a configurable or auto-suggested characters per token value (English-heavy text tends toward ~4; dense CJK toward a lower ratio). Treat the result as a planning hint, not a guarantee.

Text

?

Character counts use JavaScript string length (UTF-16 code units). Surrogate pairs (emoji, many non-English scripts) count as two units—close enough for rough planning. Token counts divide characters by chars/token; they are not identical to any specific model tokenizer.

Characters (JS length)
131
Words (whitespace)
21
CJK-like code points
0%
Suggested chars/token
4.00
Effective chars/token
4.00
Rough token estimate
33

Common use cases

  • Sanity-check whether a draft prompt fits a rough context budget before you open an API or chat UI.
  • Compare English vs mixed CJK prose using the auto chars-per-token blend.
  • Pair with word count when editorial limits are in words but the model bills in tokens.

Common mistakes to avoid

  • Treating the number as exact tokens for billing

    Providers use their own tokenizers (often BPE). This page divides characters by a heuristic ratio—it is for planning, not invoices.

  • Ignoring UTF-16 length vs visible glyphs

    Emoji and some scripts use surrogate pairs in JavaScript strings. Length is close for budgeting but not linguistically precise.

FAQ

Is this the same as tiktoken or OpenAI’s counter?

No. Those libraries use the model’s vocabulary. Here you get a fast, offline ballpark using character counts and a simple CJK-aware ratio.

Does my text leave the browser?

No. All statistics are computed locally.

Related utilities you can open in another tab—mostly client-side.