UTF-8 byte size check
ClientMany APIs and proxies cap UTF-8 octets, not the JavaScript string length you see in editors. Use this before chunking for chat or to sanity-check character budgets.
Text
?
UTF-8 byte lengthuses the browser's TextEncoder—what many HTTP and API quotas actually measure. JavaScript .length counts UTF-16 code units, so emoji and most non-BMP characters use two units but often three or four UTF-8 bytes.
0.3% of 65,536 byte ceiling.
- UTF-8 bytes
- 223
- JS string length (UTF-16 units)
- 211
- Unicode code points
- 210
- Bytes ÷ JS length
- 1.057
- Words (whitespace)
- 34
- Rough tokens (heuristic)
- ~53 (chars/token ≈ 4.00)
Common use cases
- You hit a 32k or 64k byte cap on a request body—check the real UTF-8 size before splitting with the long text chunker.
- Mixed English, CJK, and emoji inflate UTF-8 faster than ASCII-only prose; compare bytes against JS .length before assuming.
- Pair with the character-budget and token-estimate tools: bytes for transport, chars/tokens for model planning.
Common mistakes to avoid
Assuming .length matches what the server counts
Always verify in your target platform: some paths count grapheme clusters, NFC normalization, or exclude certain bytes.
Pasting huge binaries as “text”
This tool is for string paste. Base64 and other encodings change byte size—use the right encode step first.
FAQ
Is my text sent anywhere?
No. TextEncoder runs in your tab; nothing is uploaded for measurement.
Why show Unicode code points?
They split astral characters (emoji, many historic scripts) as one entry each, unlike UTF-16 .length.
Common search terms
Phrases people search for that match this tool. See the full long-tail keyword index.
- utf8 byte length pasted text
- javascript string length vs utf8 bytes
- api json body byte limit check
More tools
Related utilities you can open in another tab—mostly client-side.
LLM character budget from token cap
ClientPlan max paste characters from a target token budget—same CJK-aware heuristic as token estimate, browser-only, not tokenizer-exact.
Long text chunker for chat paste
ClientSplit long pasted prose into sequential under-the-limit blocks—paragraph, line, or fixed character breaks—browser-only; not tokenizer-exact.
Data URL encoder & decoder
ClientBuild data: URLs from UTF-8 text or small files; decode and preview images or text locally.
LLM token estimate
ClientRough character-based token planning for prompts and context—CJK-aware heuristic, browser-only—not tokenizer-exact.