Serialized File Recovery
This essay does not describe an
existing computer program, just one that should exist. This
essay is about a suggested student project in
Java programming. This essay gives a
rough overview of how it might work. I have no source, object,
specifications, file layouts or anything else useful to implementing this
This project outline is not like the artificial, tidy little problems you
are spoon-fed in school, when all the facts you need are included, nothing
extraneous is mentioned, the answer is fully specified, along with hints
to nudge you toward a single expected canonical solution. This project is
much more like the real world of messy problems where it is up to you to
fully the define the end point, or a series of ever more difficult versions
of this project, and research the information yourself to solve them.
Everything I have to say to help you with this project is written below.
I am not prepared to help you implement it; or give you any additional
materials. I have too many other projects of my own.
Though I am a programmer, I don’t do people’s homework for
them. That just robs them of an education.
You have my full permission to implement this project in any way you please
and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
Despite the best intentions, sometimes you can find yourself with serialized data and no class files or source code
for class files to reconstitute them. This can happen because:
- You were naïve and did not realise the need to keep the corresponding Java source and class files for all your
- Backup files were lost in a fire or were stolen.
- You were careless about accounting for all data files and keeping them all up to date, with the latest class
definition. You have all the code in CVS (Concurrent Versions System), but you have no idea which of it to use.
- You were sloppy and simply did not bother to bring all your old serialised files up to date every time you
changed the format. Before you knew it, you had lost track of which files used which format.
- You added serialVersionUIDs, but other than that, you did not change the object structure. Now you can’t
read your old files. You don’t remember what the old default serialVersionUIDs were. If you have made
absolutely no other changes, you might luck out if you create matching class files by simply removing the
serialVersionUIDs from your new class sources. If that does not work, now what?
No matter what the reason you can’t recover the file, what can you do now? I strongly suspect there are
sufficient metadata embedded in a serialized file to reconstruct some class files to recover the data. So this
project has several parts:
How would you go about this? You start by using ObjectOutputStream to write simple
primitives and objects, and studying the resulting file with a hex viewer, and reading the spec to learn the format.
Then later you test your code on various unknown serial files given you by others for recovery.
- Study the spec for the serialised data format that Java OutputObjectStream uses. To
fully understand it, you wil have to do some experiments.
Oracle’s JDK Platform Guide to serialization protocol
- Write a tool to convert create a set of java source code for the classes used in an ObjectStream, based only on
the clues it finds embedded in the stream. These classes would be bare bones, no methods, no transients, no
initialisation code, not custom read/write object code. However they might be enough of a skeleton to recreate the
original classes, or at least enough to use to get the raw data back out.
- Write a tool to tell you summary facts about an unknown ObjectStream file such as the classes used and the
- Write a tool to convert an ObjectStream to something human-readable for analysis, such as XML (extensible Markup Language).
- Write a scavenger tool that contains a class that behaves exactly like ObjectStreamClass except it does not complain if the serialVersionUIDs don’t match. You can
then read QbjectStreams, in a rough and ready way, scavenging as much as possible from them.
You might not even release the code, just go into the serial file recovery business.