Your Options | Int | History |
No Problem! | Double | CPU Sex |
Files | Float | Four Sexes |
Short | Byte | reverseBytes |
Long | Nio | Learning More |
Char | Unicode | Links |
In a binary file, there are no separators between fields. The files are in binary, not readable ASCII (American Standard Code for Information Interchange).
What do you do if you want to read data not in this standard format, usually prepared by some non-Java program?
You have five options:
Most people new to Java coming from C think that they need to code differently depending on whether the machine they are using internally represents integers as big or little endian. In Java it does not matter. Further, without resorting to native classes, there is no way you can even tell how they are stored. The JVM (Java Virtual Machine) may store them either way internally but Java is cleverly constructed so that it never matters. Java has no struct I/O and no unions or any of the other endian-sensitive language constructs.
The only time endianness becomes a concern is in communicating with legacy little-endian C/C++ applications.
The following code will produce the same result on either a big or little endian machine:
// take 16-bit short apart into two 8-bit bytes. short x = 0xabcd; byte high = (byte)(x >>> 8); byte low = (byte)x;/* cast implies & 0xff */ out.println( "x=" + x + " high=" + high + " low=" + low );
I had to implement routines parallel to those in java.io. DataInputStream which reads raw binary, in my LEDataInputStream and LEDataOutputStream classes. Don’t confuse this with the java.io.DataInput human-readable character-based file-interchange format.
If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream classes, here is the basic technique. If you are not familiar with how to fudge unsigned data in java by masking off the high order bits, you might want to read the unsigned and masking entries first.
In a similar way to short we handle int.
byte readByteLittleEndian( ) { // 1 byte signed -128 .. 127. Nothing special needed in addition. return readByte(); }Big and little endian byte data are identical. There is nothing to rearrange. If you wanted to reverse the
Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes such as the 390 and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.
Blefuscudians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if it were one big text field key. Real programmers read hex dumps and big-endian is a lot easier to comprehend.
In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the MOS 6502 popularised by the Apple ][.
Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java ) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still have it work. Real programmers read hex dumps and little-endian is more of a stimulating challenge.
If a machine is word addressable, with no finer addressing supported, the concept of endianness means nothing since words are fetched from RAM (Random Access Memory) in parallel, both ends first.
Byte Sex Endianness of CPU s | ||
---|---|---|
CPU | Endianness | Notes |
AMD (Advanced Micro Devices) Opteron | little | 64-bit |
AMD Sempron, Athlon, Phenom | little | 64-bit |
AMD Sempron, Thunderbird, Duron, Athlon | little | 32-bit W95, W98, Me, NT, W2K, XP, W2003, Vista, W2008 and W7-32 |
Apple ][ 6502 | little | |
Apple Mac 68000 | big | Uses Motorola 68000 |
Apple Power PC | big | CPU is bisexual but stays big in the Mac OS (Operating System). |
ARM | both | chips used in handhelds and cellphones. Endianness is controlled by a programmable mode bit. |
Burroughs 1700, 1800, 1900 | ? | bit addressable. Used different interpreter firmware instruction sets for each language. |
Burroughs B5000 | word addressable | 48-bits, Algol stack machine, first virtual memory. |
Burroughs 7800 | word addressable | 48-bits, Algol stack machine |
CDC (Control Data Corporation) LGP-30 | word-addressable only, hence no endianness | 31½ bit words. Low order bit must be 0 on the drum, but can be 1 in the accumulator. |
CDC 3300, 6600, Cyber | word-addressable, so no endianness | 60 bits |
Compaq (née DEC (Digital Equipment Corporation)) Alpha Servers | little | |
Cray X1 | big endian | 64-bit |
DEC PDP-11 | little | 16-bit. However, when it stored 32-bit ints, it would store them most significant 16-bit chunk first. |
DEC Vax | little | 32-bit |
IBM 360, 370, 380, 390, eSeries, zSeries | big | 32-bit |
IBM 7044, 7090 | word addressable | 36-bit |
IBM AS-400 | big | 64-bit |
Power PC | either | The endian-agnostic Power-PC’s have a foot in both camps. They are bisexual, but the OS usually imposes one convention or the other, e. g. Mac PowerPCs are big-endian. |
IBM Power PC G5 | big endian | The endian-agnostic pseudo-little-endian mode has been dropped. This caused Microsoft Virtual PC a major headache in emulating the Pentium on a Mac Power PC G5. |
Intel 8080, 8988, 8086, 80286 | little | 16-bit Chips used in PC s |
Intel 80386, 80486, Pentium I, II, III, IV | little | 32-bit, chips used in PC s |
Intel 8051 | big | |
Intel Xeon | little | 32-bit, used in Unisys Clearpath servers, like a Pentium designed to be used in groups, with 144 extra SIMD (Single Instruction Multiple Data) instructions for web servers. |
Intel Itanium | either | 64-bit |
MIPS (Mobile Internet Phone Services) R4000, R5000, R10000 | big | Used in Silcon Graphics IRIX. |
MOS 6502 | little | MOS 6502 was used in the Apple ][ |
Motorola 68000, 6800, 6809, 680x0, 68HC11 | big | Early Macs used the 68000. Amiga. |
NCR (National Cash Register) 8500 | big | |
NCR Century | big | |
Palm | big | Motorola 68K or ARM |
SGI (Silicon Graphics Inc) MIPS | both | machines with Cray ancestry are big, with SGI ancestry are little. |
Sun Sparc and UltraSparc | big | Oracle’s Solaris. Normally used as big-endian, but also has support for operating for little-endian mode, including being able to switch endianness under program control for particular loads and stores. |
Univac 1100 | word-addressable | 36-bit words. |
Univac 90/30 | big | IBM 370 clone |
Zilog Z80 | little | Used in CPM (Cost Per thousand/Mille impressions) machines. |
The Four Possible Byte Sexes for CPU s | ||
---|---|---|
Which Byte Is Stored in the Lower-Numbered Address? |
Which Byte Is Addressed? |
Used In |
LSB (Least Significant Bit) | LSB | Intel, AMD, Power PC, DEC. |
LSB | MSB | none that I know of. |
MSB | LSB | Perhaps one of the old word mark architecture machines. |
MSB | MSB | Mac, IBM 390, Power PC |
This page is posted |
http://mindprod.com/jgloss/endian.html | |
Optional Replicator mirror
|
J:\mindprod\jgloss\endian.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[34.239.150.167] |
| |
Feedback |
You are visitor number | |