Serialized File Recovery
This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in
Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything
else useful to implementing this project. Everything I have prepared to help you is right here.
This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is
fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the
define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.
Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many
other projects of my own.
Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
Despite the best intentions, sometimes you can find yourself with
serialized data and no class files or source code for class files to reconstitute them.
This can happen because:
- You were naïve and did not realise the need to keep the corresponding
Java source and class files for all your serialised files.
- Backup files were lost in a fire or were stolen.
- You were careless about accounting for all data files and keeping them all up to
date, with the latest class definition. You have all the code in
CVS (Concurrent Versions System), but you have no idea which of it to use.
- You were sloppy and simply did not bother to bring all your old serialised files up
to date every time you changed the format. Before you knew it, you had lost track of
which files used which format.
- You added serialVersionUIDs, but other than that, you did not change the object
structure. Now you can’t read your old files. You don’t remember what the
old default serialVersionUIDs were. If you have made absolutely no other changes, you
might luck out if you create matching class files by simply removing the
serialVersionUIDs from your new class sources. If that does not work, now what?
No matter what the reason you can’t recover the file, what can
you do now? I strongly suspect there are sufficient metadata embedded in a serialized
file to reconstruct some class files to recover the data. So this project has several
How would you go about this? You start by using ObjectOutputStream to write simple primitives and objects and studying
the resulting file with a hex viewer and reading the spec to learn the format. Then
later you test your code on various unknown serial files given you by others for
- Study the spec for the serialised data format that Java OutputObjectStream uses. To fully understand it, you wil have to do some
Oracle’s JDK Platform Guide to serialization protocol
- Write a tool to convert create a set of java source code for the classes used in an
ObjectStream, based only on the clues it finds embedded in the stream. These classes
would be bare bones, no methods, no transients, no initialisation code, not custom
read/write object code. However, they might be enough of a skeleton to recreate the
original classes, or at least enough to use to get the raw data back out.
- Write a tool to tell you summary facts about an unknown ObjectStream file such as
the classes used and the serialVersionUIDs.
- Write a tool to convert an ObjectStream to something human-readable for analysis,
such as XML (extensible Markup Language).
- Write a scavenger tool that contains a class that behaves exactly like ObjectStreamClass except it does not complain if the serialVersionUIDs
don’t match. You can then read QbjectStreams, in a rough and ready way,
scavenging as much as possible from them.
You might not even release the code, just go into the serial file recovery