Delta Creator Delta Creator
home Student Projects no local find frame, full screen Google search web for topic jump to footer translate with Babelfish by Roedy Green ©1996-2008 Canadian Mind Products
This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. It does not describe an actual complete program. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; I have too many other projects of my own.

I do contract work for a living, which could include writing a program such as this. However, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project any way you please.

Let us say you send somebody a file, then you change the original file. Instead of sending the whole new file, you can send just the changes. How do you detect the changes? You could try Funduc Patch. The catch is, often the delta files are almost as big as the original, even if all you did was rearrange the order of the paragraphs of a word processing file, or change a few lines in a compiled exe file.

I suggest inventing a generic way to do this that works with files other than just text files. Current schemes are not very bright and end up creating delta files almost as big as the new file. They are not good at noticing that text has simply been reordered.

Here’s how you might tackle it.

First you have to determine if the file really changed, or just suffered an Oscar Wilde change, e.g. he put in a comma one day into one of his poems, then the next took it out again. So here are the basic tools you have to work with to tell:

  1. Examine the last change date of the file the OS maintains. In Java you can access this timestamp with File.lastModified() . See last modified date.
  2. Check the file size. Most changes will trigger a file size change.
  3. Compute a digest, e.g. a 32-bit Adlerian checksum. If it is the same for both the old and new files chances are nothing really changed. The odds are quite astronomically small you could have a change and not detect it, but it is theoretically possible. You could also use cryptographic digests like MD-5 and SHA-1 but they are much slower to compute even though they make it all but impossible to miss a change. The advantage of this technique is you don’t need to keep the old files around, just compact digests. See digest.
  4. Compare the old and new files with equals() after reading them into RAM, possibly doing the I/O in large chunks. The FIleTransfer class for example chunk reading code.
I talk a little more about these issues in the automatic file updater student project.

One you have decide the file actually did change, you can send the changes.

For each file type you write a chunker. It divides the file into chunks. For a text file, it may divide at each newline character. For an MSWord file it may divide at each paragraph. For a FORTH BLK file, it might divide at each 1024 bytes. For a Btree file it chunks out each record…

Then you compute a hash on each chunk of the original file and index it in a Hashtable along with a reference offset and length into the old file. You don’t put the text itself in the Hashtable.

To compose the delta file, you look at each chunk in the new file. You compute its hash, and look it up in your Hashtable. If you find a match, you make a note that the chunk can be had from offset/length of the old file. If you don’t find a match, you have to insert the chunk from the new file into your delta file. Very often you will get a sequential run of matching chunks from the old file. In your notes, you can collapse these entries into one big offset/length reference.

Creating delta files of *.exe and *.com files requires some special handling. Experiment by adding a single instruction to a program and notice how that tiny change ripples and cause changes throughout the rest of the file. The idea is to send only the fact that one extra instruction was added, and somehow generate all the rippling side effects instead of transmitting them.

Once you have these low level tools in place, set up a scheme so that customers are automatically kept up to date from a master copy, using the Internet. The scheme should recover even if customer should accidentally delete files, corrupt files, or restore out of date files.

This project is a subset of a more general problem of distributing file and program updates. See the Automatic Updater project.


CMP_homejump to top
CMP logo
feedback Please email your feedback for publication, errors, omissions, broken/redirected link reports
and suggestions to improve this page to Roedy Green : feedback email
made with CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[65.110.21.43]
Your face IP:[38.103.63.18] The information on this page is for non-military use only.
You are visitor number 5,942. Military use includes use by defence contractors.
You can get a fresh copy of this page from: or possibly from your local J: drive (Java virtual drive/Mindprod website mirror)
http://mindprod.com/project/deltacreator.html J:\mindprod\project\deltacreator.html