Unicode 16 and Unicode 32 Glyphs | |||
---|---|---|---|
in Downloadable Acrobat PDF (Portable Document Format) Format | |||
hex code ⁶=Unicode 6 |
size | Sample Glyph |
Description |
0000 | 383k | A | Basic Latin |
0080 | 412k | Latin-1 Supplement: accented letters, basic symbols | |
0100 | 191k | Latin Extended-A: Esperanto accented letters | |
0180 | 362k | Ɖ | Latin Extended-B: African |
0250 | 246k | IPA Extensions: International PhoneticAlphabet | |
02B0 | 195k | ˤ | Spacing Modifier Letters |
0300 | 214k | Combining Diacritical Marks | |
0370 | 281k | Ω | Greek |
0400 | 242k | Д | Cyrillic |
0500 | 115k | Ԏ | Cyrillic Supplement |
0530 | 106k | Մ | Armenian |
0590 | 109k | א | Hebrew |
0600 | 172k | ص | Arabic |
0700 | 91k | ܛ | Syriac |
0780 | 74k | ޘ | Thaana: Maldives |
⁶0840 | 69k | Mandic: African | |
0900 | 110k | Devanagari: Hindi | |
0980 | 103k | ত | Bengali |
0A00 | 98k | ਣ | Gurmukhi: Punjabi |
0A80 | 96k | ઇ | Gujarati: Gujarat |
0B00 | 105k | ଚ | Oriya: Odiya Orissa |
0B80 | 136k | Tamil: India and Sri Lanka | |
0C00 | 137k | మ | Telugu: Andhra Pradesh |
0C80 | 122k | Kannada: Karnataka | |
0D00 | 123k | Malayalam: Kerala | |
0D80 | 104k | Sinhala: Sri Lanka | |
0E00 | 100k | ฏ | Thai |
0E80 | 100k | ຟ | Lao |
0F00 | 219k | ཌ | Tibetan |
1000 | 116k | Myanmar | |
10A0 | 100k | Ⴇ | Georgian |
1100 | 131k | ᄘ | Hangul Jamo: Korean |
1200 | 179k | ጜ | Ethiopic |
13A0 | 85k | Ꮡ | Cherokee |
1400 | 183k | Canadian Aboriginal Syllabic | |
1680 | 106k | ᚔ | Ogham: Old Irish |
16A0 | 122k | Runic | |
1700 | 73k | Tagalog: Philippino | |
1720 | 76k | Hanunoo: Mindoro in the Philippines | |
1740 | 68k | Buhid: Mindoro in the Philippines, used to write Tagalog | |
1760 | 73k | Tagbanwa: Philippines | |
1780 | 128k | ផ | Khmer: Cambodian |
1800 | 146k | ᠠ | Mongolian |
1900 | 83k | Limbu: Tibet/Burma | |
1950 | 72k | ᥠ | Tai Le: China |
19E0 | 75k | ᧤ | Khmer Symbols: Cambodian |
⁶1BC0 | 69k | Batak: Sumatra Indonesia | |
1D00 | 250k | ᴂ | Phonetic Extensions |
1E00 | 247k | Ḍ | Latin Extended Additional: dotted letters, letters with two accents. |
1F00 | 175k | ἁ | Greek Extended |
2000 | 283k | ’ | General Punctuation |
2070 | 108k | ₅ | Superscripts and Subscripts |
20A0 | 238k | Currency Symbols: including new 20b9 Rupee | |
20D0 | 145k | Combining Marks for Symbols | |
2100 | 276k | ™ | Letterlike Symbols |
2150 | 184k | Number Forms ⅐ ⅑ ⅒ | |
2190 | 109k | Arrows | |
2200 | 309k | Mathematical Operators: ∇ del, ∈ element, ∃ there exists, ∀ for all, ∪ union, ∩ intersection, ∋ contains member, ⋅ dot product, ∴ therefore, √ square root, ∧ logical and, ∨ logical or, ∑ summation, ∏ product, ≠ not equal, ≤ less or equal | |
2300 | 263k | Miscellaneous Technical: APL operators. | |
2400 | 88k | Control Pictures: for displaying unprintable ASCII control chararacters. | |
2440 | 73k | Optical Character Recognition | |
2460 | 140k | Enclosed Alphanumerics: see Dingbats 2700 for more circled digits. | |
2500 | 121k | Box Drawing: single/double lines also triangles | |
2580 | 78k | Block Elements | |
25A0 | 182k | Geometric Shapes | |
2600 | 337k | Miscellaneous Symbols: chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle. | |
2700 | 215k | Dingbats: asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens. See 2460 for more circled digits. | |
27C0 | 150k | Miscellaneous Mathematical Symbols-A: including SQL left, right and full joins. | |
27F0 | 95k | ⟰ | Supplemental Arrows-A |
2800 | 95k | Braille Patterns | |
2900 | 134k | ⤱ | Supplemental Arrows-B |
2980 | 196k | Miscellaneous Mathematical Symbols-B | |
2A00 | 164k | Supplemental Mathematical Operators: including variants of + - × ÷ | |
2B00 | 158k | Miscellaneous Symbols and Arrows | |
2C00 | 128k | Glagolytic: pre Cyrillic Bulgarian | |
2E80 | 184k | ⺮ | CJK Radicals Supplement: Chinese Japanese Korean |
2F00 | 184k | ⼮ | Kangxi Radicals: fragments combined to write Chinese |
2FF0 | 67k | ⿱ | Ideographic Description Characters |
3000 | 206k | 〖 | CJK Symbols and Punctuation: Chinese Japanese Korean |
3040 | 142k | Hiragana: (Japanese) Used when no Kanji character exists. | |
30A0 | 148k | Katakana: (Japanese) mainly for foreign names | |
3100 | 125k | Bopomofo: phonetic script for Mandarin | |
3130 | 124k | ㄱ | Hangul Compatibility Jamo: Korean |
3190 | 124k | ㆙ | Kanbun: used by Japanese to annotate classic Chinese |
31A0 | 102k | ㆥ | Bopomofo Extended: phonetic script for Mandarin |
31F0 | 84k | ㇻ | Katakana Phonetic Extensions: Japanese |
3200 | 250k | ㈄ | Enclosed CJK Letters and Months: Chinese Japanese Korean |
3300 | 261k | ㌗ | CJK Compatibility: Chinese Japanese Korean |
3400 | 5781k | 㖣 | CJK Unified Ideographs Extension A: Chinese Japanese Korean |
4DC0 | 75k | ䷱ | Yijing Hexagram Symbols: I Ching symbols |
4E00 | 25871k | CJK Unified Ideographs: Chinese Japanese Korean including Kanji digits 零 一 二 三 四 五 六 七 八 九 | |
A000 | 424k | ꅖ | Yi Syllables: classical Yi language of China |
A490 | 83k | ꒶ | Yi Radicals: classical Yi language of China |
AB00 | 79k | Ethiopic Extended-A | |
AC00 | 701k | 귖 | Hangul Syllables: Korean |
D800 | 23k | High Surrogates | |
DC00 | 23k | Low Surrogates | |
E000 | 23k | Private Use Area | |
F900 | 590k | 麟 | CJK Compatibility Ideographs: Chinese Japanese Korean |
FB00 | 116k | Alphabetic Presentation Forms: ligatures including Hebrew | |
FB50 | 236k | ﱺ | Arabic Presentation Forms-A |
FE00 | 69k | Variation Selectors: non-printing control characters | |
FE20 | 82k | Combining Half Marks | |
FE30 | 129k | ︾ | CJK Compatibility Forms: Chinese, Japanese, Korean vertical brackets |
FE50 | 148k | ﹟ | Small Form Variants: small punctuation |
FE70 | 117k | ﺚ | Arabic Presentation Forms-B |
FF00 | 274k | H | Halfwidth and Fullwidth Forms: wide and narrow letters, digits and punctuation |
FFF0 | 72k | Specials: byte order marks. | |
00010000 | 93k | Linear B Syllabary ancient Cretan | |
00010080 | 123k | Linear B Ideograms | |
00010100 | 84k | Aegean Numbers | |
00010300 | 102k | Old Italic | |
00010330 | 97k | Gothic | |
00010380 | 100k | Ugaritic: Cuneiform | |
00010400 | 108k | 𐐁 | Deseret: Mormon |
00010450 | 112k | 𐑻 | Shavian: George Bernard Shaw’s alphabet |
00010480 | 102k | 𐒁 | Osmanyav: Somalian |
00010800 | 106k | Cypriot Syllabary | |
⁶00011000 | 81k | Brahmi: ancient Indian scripts | |
⁶00016800 | 322k | Bamum Supplement: Cameroons | |
⁶0001B000 | 95k | Kana Supplement: Japanese | |
0001D000 | 230k | Byzantine Musical Symbols | |
0001D100 | 172k | Musical Symbols | |
0001D300 | 125k | 𝍎 | Tai Xuan Jing Symbols: Look like I-Ching hexagrams truncated to four lines. |
0001D400 | 418k | Mathematical Alphanumeric Symbols | |
⁶0001F0A0 | 106k | Playing Cards | |
⁶0001F300 | 625k | Miscellaneous symbols and pictographs: including pile of poo. | |
⁶0001F600 | 119k | Emoticons | |
⁶0001F680 | 130k | Transport and Map Symbols | |
⁶0001F700 | 193k | Alchemical symbols | |
00020000 | 28317k | CJK Unified Ideographs Extension B: Chinese Japanese Korean | |
⁶0002B740 | 212k | CJK Unified Ideographs Extension D: Chinese Japanese Korean | |
0002F800 | 548k | CJK Compatibility Ideographs Supp.: Chinese Japanese Korean | |
000E0000 | 136k | Tags: control characters. | |
000E0100 | 84k | Variation Selectors Supp.: non printing control characters | |
000F0000 | 23k | Supplementary Private Use Area-A | |
00100000 | 23k | Supplementary Private Use Area-B |
Informally, Unicode is a 16-bit character encoding, with surrogate pairs to handle 32-bit, used internally in programs written in Java. More precisely, Unicode is not a character encoding, but a 32-bit character set. UTF-8, UTF-16 and UTF-32 are character encodings in which the Unicode character set can be encoded.
See the example glyphs, in PDF format. Requires Adobe Acrobat to view. Also available as ASCII text file describing the glyphs with cross references to similar glyphs. Unicode does not standardise the precise shapes of the letters, i.e. the glyphs. It does, however, provide example glyphes. This distinction is most important for Hangul which encodes Chinese, Japanese and Korean. They use the same Unicode encodings, but quite different looking renderings of the characters. These differences are handled by the font designer who uses Chinese, Japnese or Korean style.
Sometimes called UCS (Universal Character Set) or ISO (International Standards Organisation) 10646. Unicode allows Java to handle international characters for most of the world’s living languages, including Arabic, Armenian, Bengali, Bopomofo, Chinese (via unified Han), Cyrillic, English, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji, Hiragana and Katakana via unified Han), Kannada, Korean (Hangul via unified Han), Lao, Maylayalam, Oriya, Tai, Tamil, Telugu, Tibetan… Unicode will make it much easier for non-English speaking programmers to write programs for English speaking users and vice versa.
To get musical symbols you need 32-bit Unicode support.
Emoji are scattered all over the map, not collected togethir in a block the way everything else is.
In Java, you get at the exotic characters by encoding them in hex in your strings like this: \u00f7\u2713 to produce ÷ ✓. See String literals for more details.
In HTML (Hypertext Markup Language), you get at the exotic characters by encoding them as entities such as ÷✓ to produce ÷ ✓.
apple | | '\uf000' unofficial, private use area |
British pound sign | £ | '\u20a4' |
checkmark | ✓ | '\u2713' |
copyright | © | '\u00a9' |
degree | ° | '\u00b0' |
dharma wheel | ☸ | '\u2638' |
division | ÷ | '\u00f7' |
bullet | • | '\u2022' |
euro | € | '\u20ac' |
female | ♀ | '\u2640' |
funeral urn | ⚱ | '\u26b1' |
heart | ♥ | '\u2665' |
bullet (as mathematical operator) | ∙ | '\u2219' |
infinity | ∞ | '\u221e' |
integral | ∫ | '\u222b' |
male | ♂ | '\u2642' |
pi | π | '\u03c0' |
PI | Π | '\u03a0' |
registered trade mark | ® | '\u00ae' |
sun | ☀ | '\u2600' |
telephone | ☎ | '\u260e' |
trademark | ™ | '\u2122' |
In addition there all kinds of interesting special characters such as: Alphabetic Presentation Forms, APL (A Programming Language), Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures — icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA (International Phonetic Alphabet) Extensions, Letterlike Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32-bit Unicode), Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical Character Recognition) — the OCR-A (Optical Character Recognition font-A) MICR (Magnetic Ink Character Recognition) characters used in magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials, Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation Selectors.
← | \u2190 |
↑ | \u2191 |
→ | \u2192 |
↓ | \u2193 |
↔ | \u2194 |
↕ | \u2195 |
↢ | \u21a2 |
↬ | \u21ac |
↭ | \u21ad |
↰ | \u21b0 |
↶ | \u21b6 |
⇅ | \u21c5 |
⇎ | \u21ce |
⇐ | \u21d0 |
⇑ | \u21d1 |
⇒ | \u21d2 |
⇓ | \u21d3 |
⇔ | \u21d4 |
⇕ | \u21d5 |
⇜ | \u21dc |
- | \u2d | hyphen-minus |
| \uad | soft-hyphen |
‐ | \u2010 | hyphen |
‑ | \u2011 | non-breaking hyphen |
‒ | \u2012 | figure dash hyphen |
– | \u2013 | en dash hyphen |
— | \u2014 | em dash hyphen |
− | \u2212 | minus sign |
𐆑 | 0x10191 (\ud835\udd04) | roman uncia sign |
See the literals section for a full explanation of how to code 16-bit Unicode characters in Java programs.
Java does not have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.
To typeset, you need separate fonts to handle such variants, with the letters encoded with the same Unicode character. The word processor automatically selects the appropriate variant. I don’t know the mechanism by which a word processor can tell which fonts are related and which styles and font-weights each supports. Presumably it is encoded somehow in the font files.
To a large extent ligatures are handled outside Unicode by automatically combining Unicode characters, though there are a few ligatures that rate a special Unicode character.
Unicode 8.0.0 is the latest version of the Unicode Standard. JDK (Java Development Kit) 1.8.0_131 supports version 6.2.0 though I doubt Java will need to change at all to support 8.0.0. All later versions of Unicode do is add more potential characters to fonts.
recommend book⇒The Unicode 5.0 Standard | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
by | The Unicode Consortium | 978-0-321-48091-0 | hardcover | |||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
birth | 1991 age:26 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
publisher | Addison-Wesley | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
published | 2006-11-19 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unicode 5.0 adds the following:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder. |
This page is posted |
http://mindprod.com/jgloss/unicode.html | |
Optional Replicator mirror
|
J:\mindprod\jgloss\unicode.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[44.200.122.214] |
| |
Feedback |
You are visitor number | |