surrogate pair : Java Glossary


surrogate pair

Internally, Java uses 16-bit characters. Unicode has been extended to include some 32-bit characters (actually only 20-bit at this point). Instead of flipping to RAM-gobbling 32-bit characters, Sun decided to handle the new characters with a pair of 16-bit characters. The added support for them in a half-hearted way.

Java does not even have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the  SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.

To pull this off, Unicode reserves two bands of 16-bit characters for use in encoding the high characters.

This page is posted
on the web at:

Optional Replicator mirror
on local hard disk J:

Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

Your face IP:[]
You are visitor number