Unicode™ : Java Glossary


Unicode logo Unicode
Unicode Glyph Ranges BOMs : Byte Order Marks
What Is Unicode? What’s Missing From Unicode?
Symbols Unicode Editors
Arrows Viewer Applet
Hyphens Notepad Unicode
Viewing Glyphs Books
Creating Unicode Documents Links
Unicode Literals in Java

Unicode 9.0 Glyph Ranges

Unicode 16 and Unicode 32 Glyphs
in Downloadable Acrobat PDF (Portable Document Format) Format
hex code
⁶=Unicode 6
size Sample
0000383kABasic Latin
0080412ksample glyph from rangeLatin-1 Supplement: accented letters, basic symbols
0100191ksample glyph from rangeLatin Extended-A: Esperanto accented letters
0180362kƉLatin Extended-B: African
0250246ksample glyph from rangeIPA Extensions: International PhoneticAlphabet
02B0195kˤSpacing Modifier Letters
0300214kCombining Diacritical Marks
0500115kԎCyrillic Supplement
078074kޘThaana: Maldives
084069kMandic: African
0900110ksample glyph from rangeDevanagari: Hindi
0A0098kGurmukhi: Punjabi
0A8096kGujarati: Gujarat
0B00105kOriya: Odiya Orissa
0B80136kTamil: India and Sri Lanka
0C00137kTelugu: Andhra Pradesh
0C80122kKannada: Karnataka
0D00123kMalayalam: Kerala
0D80104kSinhala: Sri Lanka
1100131kHangul Jamo: Korean
1400183ksample glyph from rangeCanadian Aboriginal Syllabic
1680106kOgham: Old Irish
16A0122ksample glyph from rangeRunic
170073kTagalog: Philippino
172076kHanunoo: Mindoro in the Philippines
174068kBuhid: Mindoro in the Philippines, used to write Tagalog
176073kTagbanwa: Philippines
1780128kKhmer: Cambodian
190083kLimbu: Tibet/Burma
195072kTai Le: China
19E075kKhmer Symbols: Cambodian
1BC069kBatak: Sumatra Indonesia
1D00250kPhonetic Extensions
1E00247kLatin Extended Additional: dotted letters, letters with two accents.
1F00175kGreek Extended
2000283kGeneral Punctuation
2070108kSuperscripts and Subscripts
20A0238ksample glyph from rangeCurrency Symbols: including new 20b9 Rupee
20D0145kCombining Marks for Symbols
2100276kLetterlike Symbols
2150184ksample glyph from rangeNumber Forms ⅐ ⅑ ⅒
2190109ksample glyph from rangeArrows
2200309ksample glyph from rangeMathematical Operators: ∇ del, ∈ element, ∃ there exists, ∀ for all, ∪ union, ∩ intersection, ∋ contains member, ⋅ dot product, ∴ therefore, √ square root, ∧ logical and, ∨ logical or, ∑ summation, ∏ product, ≠ not equal, ≤ less or equal
2300263ksample glyph from rangeMiscellaneous Technical: APL operators.
240088ksample glyph from rangeControl Pictures: for displaying unprintable ASCII control chararacters.
244073ksample glyph from rangeOptical Character Recognition
2460140ksample glyph from rangeEnclosed Alphanumerics: see Dingbats 2700 for more circled digits.
2500121ksample glyph from rangeBox Drawing: single/double lines also triangles
258078ksample glyph from rangeBlock Elements
25A0182ksample glyph from rangeGeometric Shapes
2600337ksample glyph from rangeMiscellaneous Symbols: chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle.
2700215ksample glyph from rangeDingbats: asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens. See 2460 for more circled digits.
27C0150ksample glyph from rangeMiscellaneous Mathematical Symbols-A: including SQL left, right and full joins.
27F095kSupplemental Arrows-A
280095ksample glyph from rangeBraille Patterns
2900134kSupplemental Arrows-B
2980196ksample glyph from rangeMiscellaneous Mathematical Symbols-B
2A00164ksample glyph from rangeSupplemental Mathematical Operators: including variants of + - × ÷
2B00158ksample glyph from rangeMiscellaneous Symbols and Arrows
2C00128kGlagolytic: pre Cyrillic Bulgarian
2E80184kCJK Radicals Supplement: Chinese Japanese Korean
2F00184kKangxi Radicals: fragments combined to write Chinese
2FF067kIdeographic Description Characters
3000206kCJK Symbols and Punctuation: Chinese Japanese Korean
3040142ksample glyph from rangeHiragana: (Japanese) Used when no Kanji character exists.
30A0148ksample glyph from rangeKatakana: (Japanese) mainly for foreign names
3100125kBopomofo: phonetic script for Mandarin
3130124kHangul Compatibility Jamo: Korean
3190124kKanbun: used by Japanese to annotate classic Chinese
31A0102kBopomofo Extended: phonetic script for Mandarin
31F084kKatakana Phonetic Extensions: Japanese
3200250kEnclosed CJK Letters and Months: Chinese Japanese Korean
3300261kCJK Compatibility: Chinese Japanese Korean
34005781kCJK Unified Ideographs Extension A: Chinese Japanese Korean
4DC075kYijing Hexagram Symbols: I Ching symbols
4E0025871ksample glyph from rangeCJK Unified Ideographs: Chinese Japanese Korean including Kanji digits 零 一 二 三 四 五 六 七 八 九
A000424kYi Syllables: classical Yi language of China
A49083kYi Radicals: classical Yi language of China
AB0079kEthiopic Extended-A
AC00701kHangul Syllables: Korean
D80023kHigh Surrogates
DC0023kLow Surrogates
E00023kPrivate Use Area
F900590kCJK Compatibility Ideographs: Chinese Japanese Korean
FB00116ksample glyph from rangeAlphabetic Presentation Forms: ligatures including Hebrew
FB50236kArabic Presentation Forms-A
FE0069kVariation Selectors: non-printing control characters
FE2082kCombining Half Marks
FE30129kCJK Compatibility Forms: Chinese, Japanese, Korean vertical brackets
FE50148kSmall Form Variants: small punctuation
FE70117kArabic Presentation Forms-B
FF00274kHalfwidth and Fullwidth Forms: wide and narrow letters, digits and punctuation
FFF072kSpecials: byte order marks.
0001000093kLinear B Syllabary ancient Cretan
00010080123kLinear B Ideograms
0001010084kAegean Numbers
00010300102kOld Italic
00010380100ksample glyph from rangeUgaritic: Cuneiform
00010400108k𐐁Deseret: Mormon
00010450112k𐑻Shavian: George Bernard Shaw’s alphabet
00010480102k𐒁Osmanyav: Somalian
00010800106kCypriot Syllabary
0001100081kBrahmi: ancient Indian scripts
00016800322kBamum Supplement: Cameroons
0001B00095kKana Supplement: Japanese
0001D000230kByzantine Musical Symbols
0001D100172ksample glyph from rangeMusical Symbols
0001D300125k𝍎Tai Xuan Jing Symbols: Look like I-Ching hexagrams truncated to four lines.
0001D400418ksample glyph from rangeMathematical Alphanumeric Symbols
0001F0A0106ksample glyph from rangePlaying Cards
0001F300625ksample glyph from rangeMiscellaneous symbols and pictographs: including pile of poo.
0001F600119ksample glyph from rangeEmoticons
0001F680130ksample glyph from rangeTransport and Map Symbols
0001F700193ksample glyph from rangeAlchemical symbols
0002000028317ksample glyph from rangeCJK Unified Ideographs Extension B: Chinese Japanese Korean
0002B740212kCJK Unified Ideographs Extension D: Chinese Japanese Korean
0002F800548kCJK Compatibility Ideographs Supp.: Chinese Japanese Korean
000E0000136ksample glyph from rangeTags: control characters.
000E010084kVariation Selectors Supp.: non printing control characters
000F000023kSupplementary Private Use Area-A
0010000023kSupplementary Private Use Area-B

What Is Unicode?

Informally, Unicode is a 16-bit character encoding, with surrogate pairs to handle 32-bit, used internally in programs written in Java. More precisely, Unicode is not a character encoding, but a 32-bit character set. UTF-8, UTF-16 and UTF-32 are character encodings in which the Unicode character set can be encoded.

See the example glyphs, in PDF format. Requires Adobe Acrobat to view. Also available as ASCII text file describing the glyphs with cross references to similar glyphs. Unicode does not standardise the precise shapes of the letters, i.e. the glyphs. It does, however, provide example glyphes. This distinction is most important for Hangul which encodes Chinese, Japanese and Korean. They use the same Unicode encodings, but quite different looking renderings of the characters. These differences are handled by the font designer who uses Chinese, Japnese or Korean style.

Sometimes called UCS (Universal Character Set) or ISO (International Standards Organisation) 10646. Unicode allows Java to handle international characters for most of the world’s living languages, including Arabic, Armenian, Bengali, Bopomofo, Chinese (via unified Han), Cyrillic, English, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji, Hiragana and Katakana via unified Han), Kannada, Korean (Hangul via unified Han), Lao, Maylayalam, Oriya, Tai, Tamil, Telugu, Tibetan… Unicode will make it much easier for non-English speaking programmers to write programs for English speaking users and vice versa.

To get musical symbols you need 32-bit Unicode support.

Emoji are scattered all over the map, not collected togethir in a block the way everything else is.

In Java, you get at the exotic characters by encoding them in hex in your strings like this: \u00f7\u2713 to produce ÷ ✓. See String literals for more details.

In HTML (Hypertext Markup Language), you get at the exotic characters by encoding them as entities such as ÷✓ to produce ÷ ✓.

Unicode Symbols

There are even codes for:
apple '\uf000' unofficial, private use area
British pound sign £ '\u20a4'
checkmark '\u2713'
copyright © '\u00a9'
degree ° '\u00b0'
dharma wheel '\u2638'
division ÷ '\u00f7'
bullet '\u2022'
euro '\u20ac'
female '\u2640'
funeral urn '\u26b1'
heart '\u2665'
bullet (as mathematical operator) '\u2219'
infinity '\u221e'
integral '\u222b'
male '\u2642'
pi π '\u03c0'
PI Π '\u03a0'
registered trade mark ® '\u00ae'
sun '\u2600'
telephone '\u260e'
trademark '\u2122'
This does not mean your fonts will support all these wonders, of course.

In addition there all kinds of interesting special characters such as: Alphabetic Presentation Forms, APL (A Programming Language), Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures — icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA (International Phonetic Alphabet) Extensions, Letterlike Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32-bit Unicode), Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical Character Recognition) — the OCR-A (Optical Character Recognition font-A) MICR (Magnetic Ink Character Recognition) characters used in magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials, Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation Selectors.

Unicode Arrows

There are also arrows:
There are even more arrows defined in Unicode: 2190-21ff, To use these characters in HTML, you need to code them as &… entities.


There are also are variety of hyphen characters:
- \u2d hyphen-minus
­ \uad soft-hyphen
\u2010 hyphen
\u2011 non-breaking hyphen
\u2012 figure dash hyphen
\u2013 en dash hyphen
\u2014 em dash hyphen
\u2212 minus sign
𐆑 0x10191 (\ud835\udd04) roman uncia sign

Viewing Unicode Glyphs

Nic Fulton of Reuters has written an Java Test Applet that can display all 64 thousand Unicode characters including the Chinese/Korean Han. How many of them actually display on your screen depends on the font handling ability of your browser and operating system and which fonts you have installed. In Java programs, intractable Unicode characters are represented in the form '\uffff', with four hex digits. Ordinary characters like 'A' are actually 16-bit Unicode too.

Creating Unicode Documents

How do you create and edit the various flavours of Unicode documents? You can create them in some specific encoding then convert them. To write a little utility to do that read up on encoding and ask the File I/O Amanuensis for sample code. You can use lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. You would have to acquire an almost empty Unicode document for getting started with new documents. It is even clever enough to deal with byte order (endian) marks. Recent version of MS Word in Windows NT/W2K/XP/W2K3 also work.


See the literals section for a full explanation of how to code 16-bit Unicode characters in Java programs.

Java does not have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the  SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.

Byte Order Marks

There are two different standards, Unicode which assigns glyphs to numbers and UTF (Unicode Transformation unit) which describes how you encode these number in a file. Byte order marks are part of the UTF standard, not the Unicode standard. See more on BOMs (Byte Order Marks).

What’s Missing From Unicode?

There are no Unicode glyphs for the following: Unicode is not concerned with typesetting, just with raw text. In other words, it is about characters, (logical letters) not glyphs (how letters are precisely shaped). Unicode has various flavours of digits, that look much the same, but they are intended to be used in different contexts.

To typeset, you need separate fonts to handle such variants, with the letters encoded with the same Unicode character. The word processor automatically selects the appropriate variant. I don’t know the mechanism by which a word processor can tell which fonts are related and which styles and font-weights each supports. Presumably it is encoded somehow in the font files.

To a large extent ligatures are handled outside Unicode by automatically combining Unicode characters, though there are a few ligatures that rate a special Unicode character.

Unicode Editors

Where do Unicode files come from? You can create them with: You can edit or create UTF-8 or UTF-16 files with windows notepad.

Unicode 8.0.0

Unicode 8.0.0 is the latest version of the Unicode Standard. JDK (Java Development Kit) 1.8.0_131 supports version 6.2.0 though I doubt Java will need to change at all to support 8.0.0. All later versions of Unicode do is add more potential characters to fonts.


book cover recommend book⇒The Unicode 5.0 Standardto book home
by The Unicode Consortium 978-0-321-48091-0 hardcover
birth 1991 age:26
publisher Addison-Wesley
published 2006-11-19
Unicode 5.0 adds the following:
  • Security mechanisms
  • a standard collation algorithm for various national orderings.
  • A common locale data repository.
  • Improvements to the encoding model for UTF-8.
  • Rigorous stability of case folding.
  • a systematic framework covering combining characters, Unicode strings, line breaking and segmentation
The current version is 8.0. There have been no comprehensive books published since this one.
Australian flag abe books anz abe books.ca Canadian flag
German flag abe books.de amazon.ca Canadian flag
German flag amazon.de Chapters Indigo Canadian flag
Spanish flag amazon.es Chapters Indigo eBooks Canadian flag
Spanish flag iberlibro.com abe books.com American flag
French flag abe books.fr amazon.com American flag
French flag amazon.fr Barnes & Noble American flag
Italian flag abe books.it Nook at Barnes & Noble American flag
Italian flag amazon.it Kobo American flag
India flag junglee.com Google play American flag
UK flag abe books.co.uk O’Reilly Safari American flag
UK flag amazon.co.uk Powells American flag
UN flag other stores
Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder.

This page is posted
on the web at:


Optional Replicator mirror
of mindprod.com
on local hard disk J:

Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

Your face IP:[]
You are visitor number