surrogate pair : Java Glossary

surrogate pair

Internally, Java uses 16-bit characters. Unicode has been extended to include some 32-bit characters (actually only 20-bit at this point). Instead of flipping to RAM-gobbling 32-bit characters, Sun decided to handle the new characters with a pair of 16-bit characters. The added support for them in a half-hearted way.

Java does not even have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.

To pull this off, Unicode reserves two bands of 16-bit characters for use in encoding the high characters.


CMP homejump to top You can get the freshest copy of this page from: or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror)
http://mindprod.com/jgloss/surrogatepair.html J:\mindprod\jgloss\surrogatepair.html
logo
Please email your , letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to Roedy Green : feedback email. If you want your message, your name or email kept confidential, not considered for public posting, please explicitly specify that. Unless you state otherwise, I will treat your message as a letter to the editor that I may or may not publish in the feedback section. After that, it will be too late to retract it. If you disagree with something I said, please quote it and cite the web page where you found it, tell me why you think it is wrong, and, if possible, provide some supporting evidence. Threatening to kill me or spouting obscenities has yet to persuade me to change my mind.
mindprod.com IP:[65.110.21.43]
view BlogYour face IP:[38.107.179.213]
You are visitor number 11.