Internally, Java uses 16-bit characters. Unicode has been extended to include some 32-bit characters (actually only 20-bit at this point). Instead of flipping to RAM-gobbling 32-bit characters, Sun decided to handle the new characters with a pair of 16-bit characters. The added support for them in a half-hearted way.
Java does not even have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.
To pull this off, Unicode reserves two bands of 16-bit characters for use in encoding the high characters.
|
|
You can get the freshest copy of this page from: | or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) |
| http://mindprod.com/jgloss/surrogatepair.html | J:\mindprod\jgloss\surrogatepair.html | |
![]() | Please email your feedback for publication,
letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording,
broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
| |
| Canadian Mind Products | ||
| mindprod.com IP:[65.110.21.43] | ||
| view Blog | Your face IP:[38.107.179.213] | |
| Feedback | You are visitor number 11. | |