Question 1

What is the difference between code points, UTF-16 units, and UTF-8 bytes?

Accepted Answer

A code point is one Unicode character (what a human counts). UTF-16 units are how JavaScript stores strings — characters above U+FFFF like most emoji take two units (a surrogate pair). UTF-8 bytes are how text is usually stored and sent: ASCII is one byte, accented Latin two, most CJK three, emoji four. The summary shows all three for your input.

Question 2

Why does one emoji count as two UTF-16 units?

Accepted Answer

JavaScript strings are UTF-16. Characters beyond the Basic Multilingual Plane (U+10000 and up) are encoded as a surrogate pair, so string.length reports 2 for a single emoji. This tool counts by code point, so it shows one row per actual character.

Question 3

What do the HTML and JS escape columns give me?

Accepted Answer

The HTML column is a numeric character reference (&#xHEX;) you can paste into markup. The JS escape is the string-literal form — \uXXXX for the Basic Multilingual Plane, or \u{...} for higher code points — to embed a character in source safely.

Question 4

Why are some characters shown as a box?

Accepted Answer

Control characters (like tab, newline, null), the byte-order mark, and zero-width characters have no visible glyph, so they render as ▢. Their code point, UTF-8 bytes, and escapes are still shown — which is exactly how you catch an invisible character that is breaking your data.

Question 5

Is my text sent anywhere?

Accepted Answer

No. Every breakdown is computed in your browser with the standard TextEncoder and string APIs. Nothing you paste leaves your device.

Char	Code point	Dec	UTF-8	HTML	JS escape
H	U+0048	72	48	H	\u0048
é	U+00E9	233	C3 A9	é	\u00E9
́	U+0301	769	CC 81	́	\u0301
l	U+006C	108	6C	l	\u006C
l	U+006C	108	6C	l	\u006C
o	U+006F	111	6F	o	\u006F
	U+0020	32	20		\u0020
🌮	U+1F32E	127790	F0 9F 8C AE	🌮	\u{1F32E}
	U+0020	32	20		\u0020
—	U+2014	8212	E2 80 94	—	\u2014
	U+0020	32	20		\u0020
c	U+0063	99	63	c	\u0063
a	U+0061	97	61	a	\u0061
f	U+0066	102	66	f	\u0066
é	U+00E9	233	C3 A9	é	\u00E9

Unicode Inspector

About this tool

Frequently asked questions

About this tool

Frequently asked questions

Related tools