image provider

Serialized File Recovery


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

The Problem

Despite the best intentions, sometimes you can find yourself with serialized data and no class files or source code for class files to reconstitute them. This can happen because:
  1. You were naïve and did not realise the need to keep the corresponding Java source and class files for all your serialised files.
  2. Backup files were lost in a fire or were stolen.
  3. You were careless about accounting for all data files and keeping them all up to date, with the latest class definition. You have all the code in CVS (Concurrent Versions System), but you have no idea which of it to use.
  4. You were sloppy and simply did not bother to bring all your old serialised files up to date every time you changed the format. Before you knew it, you had lost track of which files used which format.
  5. You added serialVersionUIDs, but other than that, you did not change the object structure. Now you can’t read your old files. You don’t remember what the old default serialVersionUIDs were. If you have made absolutely no other changes, you might luck out if you create matching class files by simply removing the serialVersionUIDs from your new class sources. If that does not work, now what?

The Tools

No matter what the reason you can’t recover the file, what can you do now? I strongly suspect there are sufficient metadata embedded in a serialized file to reconstruct some class files to recover the data. So this project has several parts:
  1. Study the spec for the serialised data format that Java OutputObjectStream uses. To fully understand it, you wil have to do some experiments.

    Learning More


  2. Write a tool to convert create a set of java source code for the classes used in an ObjectStream, based only on the clues it finds embedded in the stream. These classes would be bare bones, no methods, no transients, no initialisation code, not custom read/write object code. However, they might be enough of a skeleton to recreate the original classes, or at least enough to use to get the raw data back out.
  3. Write a tool to tell you summary facts about an unknown ObjectStream file such as the classes used and the serialVersionUIDs.
  4. Write a tool to convert an ObjectStream to something human-readable for analysis, such as XML (extensible Markup Language).
  5. Write a scavenger tool that contains a class that behaves exactly like ObjectStreamClass except it does not complain if the serialVersionUIDs don’t match. You can then read QbjectStreams, in a rough and ready way, scavenging as much as possible from them.
How would you go about this? You start by using ObjectOutputStream to write simple primitives and objects and studying the resulting file with a hex viewer and reading the spec to learn the format. Then later you test your code on various unknown serial files given you by others for recovery.

You might not even release the code, just go into the serial file recovery business.

serialization

This page is posted
on the web at:

http://mindprod.com/project/serializedrecovery.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\serializedrecovery.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[18.97.14.81]
You are visitor number