Unicode Inspector
Break text into its Unicode code points — U+ hex, decimal, UTF-8 bytes, HTML entity, and JS escape per character, plus code-point / UTF-16 / UTF-8 length totals. Client-side.
| Char | Code point | Dec | UTF-8 | HTML | JS escape |
|---|---|---|---|---|---|
| H | U+0048 | 72 | 48 | H | \u0048 |
| é | U+00E9 | 233 | C3 A9 | é | \u00E9 |
| ́ | U+0301 | 769 | CC 81 | ́ | \u0301 |
| l | U+006C | 108 | 6C | l | \u006C |
| l | U+006C | 108 | 6C | l | \u006C |
| o | U+006F | 111 | 6F | o | \u006F |
| U+0020 | 32 | 20 |   | \u0020 | |
| 🌮 | U+1F32E | 127790 | F0 9F 8C AE | 🌮 | \u{1F32E} |
| U+0020 | 32 | 20 |   | \u0020 | |
| — | U+2014 | 8212 | E2 80 94 | — | \u2014 |
| U+0020 | 32 | 20 |   | \u0020 | |
| c | U+0063 | 99 | 63 | c | \u0063 |
| a | U+0061 | 97 | 61 | a | \u0061 |
| f | U+0066 | 102 | 66 | f | \u0066 |
| é | U+00E9 | 233 | C3 A9 | é | \u00E9 |
About this tool
Paste any text and see it decomposed character by character. For each code
point the tool shows the U+ hexadecimal value, its decimal
number, the exact UTF-8 bytes it encodes to, an HTML numeric entity, and
the JavaScript escape sequence. Above the table, a summary reconciles the
three ways length is counted: code points, UTF-16 units, and UTF-8 bytes.
Those three numbers diverging is the source of a surprising amount of bugs. A single emoji is one code point but two UTF-16 units and four UTF-8 bytes; an accented letter may be one composed code point or two (base plus a combining mark) depending on normalization. When a database column “fits 20 characters” but rejects your input, or a substring lands in the middle of a surrogate pair, this breakdown shows you exactly what is going on.
It is also the fastest way to hunt down invisible characters — a zero-width space pasted from a web page, a non-breaking space masquerading as a regular one, a stray BOM at the start of a file, or a control character in a CSV. Those render as a box but still report their code point and bytes, so you can find and remove what your eyes cannot see. Everything runs in the browser using the standard text APIs.
Frequently asked questions
What is the difference between code points, UTF-16 units, and UTF-8 bytes?
A code point is one Unicode character (what a human counts). UTF-16 units are how JavaScript stores strings — characters above U+FFFF like most emoji take two units (a surrogate pair). UTF-8 bytes are how text is usually stored and sent: ASCII is one byte, accented Latin two, most CJK three, emoji four. The summary shows all three for your input.
Why does one emoji count as two UTF-16 units?
JavaScript strings are UTF-16. Characters beyond the Basic Multilingual Plane (U+10000 and up) are encoded as a surrogate pair, so string.length reports 2 for a single emoji. This tool counts by code point, so it shows one row per actual character.
What do the HTML and JS escape columns give me?
The HTML column is a numeric character reference (&#xHEX;) you can paste into markup. The JS escape is the string-literal form — \uXXXX for the Basic Multilingual Plane, or \u{...} for higher code points — to embed a character in source safely.
Why are some characters shown as a box?
Control characters (like tab, newline, null), the byte-order mark, and zero-width characters have no visible glyph, so they render as ▢. Their code point, UTF-8 bytes, and escapes are still shown — which is exactly how you catch an invisible character that is breaking your data.
Is my text sent anywhere?
No. Every breakdown is computed in your browser with the standard TextEncoder and string APIs. Nothing you paste leaves your device.