UTF-8 byte size check

Client

Many APIs and proxies cap UTF-8 octets, not the JavaScript string length you see in editors. Use this before chunking for chat or to sanity-check character budgets.

Text

?

UTF-8 byte lengthuses the browser's TextEncoder—what many HTTP and API quotas actually measure. JavaScript .length counts UTF-16 code units, so emoji and most non-BMP characters use two units but often three or four UTF-8 bytes.

0.3% of 65,536 byte ceiling.

UTF-8 bytes
223
JS string length (UTF-16 units)
211
Unicode code points
210
Bytes ÷ JS length
1.057
Words (whitespace)
34
Rough tokens (heuristic)
~53 (chars/token ≈ 4.00)

Common use cases

  • You hit a 32k or 64k byte cap on a request body—check the real UTF-8 size before splitting with the long text chunker.
  • Mixed English, CJK, and emoji inflate UTF-8 faster than ASCII-only prose; compare bytes against JS .length before assuming.
  • Pair with the character-budget and token-estimate tools: bytes for transport, chars/tokens for model planning.

Common mistakes to avoid

  • Assuming .length matches what the server counts

    Always verify in your target platform: some paths count grapheme clusters, NFC normalization, or exclude certain bytes.

  • Pasting huge binaries as “text”

    This tool is for string paste. Base64 and other encodings change byte size—use the right encode step first.

FAQ

Is my text sent anywhere?

No. TextEncoder runs in your tab; nothing is uploaded for measurement.

Why show Unicode code points?

They split astral characters (emoji, many historic scripts) as one entry each, unlike UTF-16 .length.

Common search terms

Phrases people search for that match this tool. See the full long-tail keyword index.

  • utf8 byte length pasted text
  • javascript string length vs utf8 bytes
  • api json body byte limit check

Related utilities you can open in another tab—mostly client-side.