Automatic File Updates
©1996-2012 Roedy Green, Canadian Mind ProductsThis essay does not describe an
existing computer program, just one that should exist. This
essay is about a suggested student project in
Java programming. This essay gives a
rough overview of how it might work. I have no source, object,
specifications, file layouts or anything else useful to implementing this
project.
This project outline is not like the artificial, tidy little problems you
are spoon-fed in school, when all the facts you need are included, nothing
extraneous is mentioned, the answer is fully specified, along with hints
to nudge you toward a single expected canonical solution. This project is
much more like the real world of messy problems where it is up to you to
fully the define the end point, or a series of ever more difficult versions
of this project, and research the information yourself to solve them.
Everything I have to say to help you with this project is written below.
I am not prepared to help you implement it; or give you any additional
materials. I have too many other projects of my own.
Though I am a programmer, I don’t do people’s homework for
them. That just robs them of an education.
You have my full permission to implement this project in any way you please
and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
The intent of this project is to keep any files (including *.jar and *.zip files) up-to-date on the client site, automatically. It is designed to dovetail with the Delta File Creator that makes the process even more efficient by sending just the parts of
files that have changed. This project sends entire files, or entire zip members, if so much as a comma changes in
them.
There are already some push tools that update files on the client sites. These include
Marímba, Java Web Start, DMP, Funduc Patch, and Symantec LiveUpdate. However, they are not suitable for two simple applications I
have:
- Keeping people’s expanded copies of cmp*.zip up-to-date. These are giant files
containing all the *.html, *.gif, *.jpg
and *.mp3 files on my website in downloadable zipped format.
- Keeping a JavaHelp.jar file up-to-date on a client site.
What are the problems?
- Every time even so much as a comma changes in a file, the client must download the whole thing.
- Some tools require running custom software on the server. ISP (Internet Service Provider)s
don’t like you doing that, at least not
without paying them a fat fee.
- The tools may be too expensive for the hobbyist.
- You can’t FTP (File Transfer Protocol) upload the master copy of the files on the server because people are always busy
downloading them.
How are Automatic Update Files Stored on the Server?
Instead of storing entire jars or zips on the server, the members of the jars are stored as separate *.upd files on the server. The files are given sequential numbered names, e. g. 00000042.upd. The server is a perfectly standard HTTP (Hypertext Transfer Protocol) server. The only thing you need to configure on
your server is a new MIME type for the *.upd extension for your component files. When the component of a jar is updated, the new contents are
assigned a new sequential number. That way there is no problem uploading a file to the server that others are
downloading. Any change gets a new number. You retire updated file numbers. All the *.upd
files are stored in GZIP compressed format. Content files in jars and zips are compressed only once. *.upd files representing stand-alone files too are compressed. Compressed-standalone files are
effectively compressed twice. Fanatics might want to invent a way to avoid that tiny extra overhead.
How are Automatic Update Files Stored on the Client?
The files are conventional jar, zip, program or stand-alone data files. They can have any name or extension. They can
live anywhere on disk. They may be compressed or uncompressed in any conceivable format. Jars can even be digitally
signed. They are not permanently stored as *.upd files.
The master copies of the files created by the programmers are also maintained in this conventional way. No one is
aware of the *.upd files. They are a transport mechanism only. The conversion to and from the *.upd files is fully automatic.
How Does Automatic Update Work
There is a tiny root file on the server. It contains the sequential number of the current
state-of-the-union file. The state-of-the union file also is also stored on the server in
GZIP compressed form. It has entries that contain the following data for each file/member managed by the file
updating system.
| State-Of-The-Union File Fields |
| Field |
Purpose |
| Status |
A=Active, R=Replaced, D=Deleted.
Active means this file is necessary for the client. Replaced
means this file has been replaced by some other version, Deleted means this file is no
longer used. To start a client from scratch, all you need do is examine the Active entries. Usually all the
Replaced entries will be filtered out. They are there mainly for debugging. Similarly
very old Deleted entries might be filtered out. |
| Sequence Number |
Appending *.upd will give you the name of the corresponding file on the server.
All the files for a project are stored in the same directory on the server, even if they are stored in many
different directories on the client. |
| Install Root Code |
Usually this will be 1 to mean that files are installed relative to the client’s installation
directory. For a complex project, you may have multiple installation directories. 0 means file names are
absolute. |
| File Name |
The fully qualified filename of where this content eventually ends up on the client. |
| Member Name |
The fully qualified filename of the jar or zip entry. If this is blank, this entry represents a standalone
file. |
| Date/Time Updated |
Miliseconds since 1970, using GMT (Greenwich Mean Time), a Unix or Java timestamp. This is used to set
or check the file’s system date. |
| Checksum |
32-bit Adlerian checksum of the data. It is computed on the uncompressed form of the file. Adlerian
checksums are faster to compute and verify than other types. |
|---|
When the client wants to refresh its files, it first downloads the tiny root file. From
there it can download and decompress current state-of-the-union file. It knows it already
has current files, up to and including sequence number N. It knows this even if it had to restore its data files from
backup. It then looks in the state-of-the-union file and processes the entries. If it sees a Delete entry, it deletes the corresponding file or member. If the zip or jar has no more members, you
delete the file itself. If it sees a Replace entry, it ignores it. If it sees an Active entry, it inserts/replaces that file or member. It may optionally verify the checksum of newly
updated or all active members. If there are failures, it can automatically redownload any failed entries, and even
optionally even totally recreate any jar/zip files from scratch. This makes your applications and files self-healing.
You need a tool to help you prepare your *.upd files for uploading to the server. It
starts with a list of directories and files to process. It detects file and member changes via file and member dates
or possibly with the checksums, or even with comparison with your current set of *.upd
files. It is probably easiest to use file dates exclusively in determining which *.jar files
to create, and have a checksum verify routine you run periodically. If you get a failure, you manually redate the
affected files with a touch utility to force a correction.
How do you handle a file whose date has changed, but whose contents have not? You could:
- Avoid changing file dates on your master files unless you really change them.
- Propagate the redated file as if it were a truly updated file.
- Fix the original file date back to the previous date.
- Ignore the problem. Leave the file with the old date on the client site. Don’t worry that its date does
not exactly match.
- Treat this as a special case of delta compression. See below.
- Invent a special propagation mechanism that changes file dates, without actually transmitting the corresponding
files.
What if the client refreshes so infrequently that the necessary Delete entries are no
longer present?
- You might not sweat it, and just leave the ancient deleted files lying around on the client site.
- You might tell the client to start from scratch.
- It is probably easiest to just keep all Delete entries present forever. Deletes are
fairly rare in comparison with Replaces.
- Treat this with multiple state-of-the-union files. See below.
Extending Automatic Update
There are seven directions you could take this project once you get these basics handled:
- You get some eager beavers who check every ten minutes if there have been updates. The way the scheme works
now, they would download the entire rather fat state-of-the union *.upd file just to
discover nothing had changed. You get around this by maintaining several state-of-the-union files. You might have a
yearly, monthly, weekly, daily, and hourly version. The hourly version just has changes made in the last hour. The
root file points to all of them. In addition the root file tells you the low and high
sequence number each state-of-the-union file covers. The yearly version may be completely up-to-date, or it may
not. By looking at the ranges, the client can figure out which of these files it needs to download and process,
if any. It may need to process more than one or none. You could have as many of them or as few of them as
your wanted, spanning any range of sequence numbers.
- Getting an install started from scratch is rather inefficient since the client downloads a zillion tiny files.
It is also rather inefficient to do a massive update, since a large number of individual files/members would have
to be downloaded. Therefore each upd file might live also in one or more lump files, where a number of upd files
are consolidated. The client downloader can then decide the most efficient way to get the individual files it
needs. The lump files can be retired just like upd files.e They can be updated, to agglutinate groups of upd files
in different ways, or to drop retired/replaced upd files. The restriction is, the client must download the entire
lump file if it wants even one upd file in it. As a last resort, all upd files are always available individually.
Ironically, each lump file is also an upd file, using the same sequence number naming scheme. The client would look
at the lump files available, and decide which ones have the most stuff they need and the least stuff they
don’t. If there is too much unwanted stuff, then it would pay to download files individually. In practice you
might have a lump for updates up to the first of this year, one for updates from first of the year to the first of
this month, one for the first of the month until yesterday, and one for today’s updates. Note that ideally
you rebuild all the lumps each day to prune them of deadwood, and add any new files to them. A client coming in
cold would need to download all four lump files. A client who updated daily would need to download only one. A
client who updated hourly would download individually. The server is free to update the lump files at any time even
when clients are in the middle of downloads, because of the way updates are done by always creating new upd and
lump files.
- You need to do your updates to the server’s copy of the *.upd files in batches. You don’t update
the root file until all the upd files in the batch are complete. The client does not start using his system again
until all the upd files mentioned in the root file are downloaded and installed. You don’t want the client
using his system when only a few of the files of the update have been installed.
- Some files can’t just simply be plopped on the client site. They need to be installed, e.g. inserted into
the registry or specially processed, e.g. to set special attribute bits, reboot to replace a DLL (Dynamic Link Library), etc. the way
Java Web Start does. You need a way to specify custom
installers. See Installer and the Installer Project.
- This scheme still redownloads large non-jar files even if so much as a comma in them changes. See the Delta Creator project for how to tackle that problem. The same problem applies to large
members where only a tiny part of them has actually changed.
- Automatically notify clients when there are changes. This could be done by email, or by a tiny UDP (User Datagram Protocol) or TCP/IP (Transmission Control Protocol/Internet Protocol)
probe to the running application. This just probes them to consider doing a refresh cycle. It does not actually
send them any update data. If there are very frequent updates to the master files, you have to avoid pestering your
clients more frequently than they want to be pestered. You also have to consider they may have many indendent
applications using this scheme. The probe had better identify the application and how to unregister email probes.
You eventually have to give up on notifying clients that never bother to update or respond to probes. The prober is
not necessarily the server where the client gets updates.
- If you have a great many clients, you need a way to clone your server files and have clients use all the
mirrored servers TuCows-style, picking one close, functioning, up-to-date and not too busy. Ideally you want mirror
site selection automatic. Further you want propagation to the various mirror sites automatic. You want seriously
out of date clients to first use the less-up-to-date servers to avoid overloading the up-to-date ones.
- Other projects that could be based on such automatic update include Bulk
file distributor, HTML Glossary Presenter, On-Line Books, Sanity Checker, Infinite Disk, Prebranded Software rental with auto
updates.
You can use the File Transfer classes to transfer the files around
locally and remotely. You can use the File I/O Amanuensis to teach
you how to compress and decompress files. You will need to study the Zipfile, ZipEntry, ZipInputStream and GZIPInputStream
classes for taking apart jar files and compressing/decompressing.
Baby Steps Toward Automatic Update Nirvana
The evolution of automatic update goes like this:
- Download entire applications as a lump, e.g. with an installer, or with a giant zip file.
- Download just the files that have changed, using Java Web
Start.
- Download just the members of jars and zips that have changed using the Automatic File
Updater described here.
- Use lumping to avoid downloading separate upd files most of the time.
- Download just the chunks of files or members that have changed using the Delta
Creator.
- Use the bulk file distributor project so that you can efficiently use
multiple client-based distributed servers. This lets you distribute to millions of customers using only a small
server.
- Instead of using simple HTTP file transfer protocols, use custom server software to let the client grab all the
updates in a single TCP/IP session.
- Use instantaneous update so that applications can use the up to the second information, even information that
becomes available after the app has started. This requires storing data in specially structured files, usually an
SQL (Standard Query Language) database. Ironically this type of update is much more evolved than the simpler types described above. See
Oracle Distributed
Databases and Oracle
Replicated Databases.
  |
You can get the freshest copy of this page from: |
or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) |
| http://mindprod.com/project/autoupdate.html |
J:\mindprod\project\autoupdate.html |
 | Please email your feedback for publication, letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
If you want your message kept confidential, not considered for posting, please explicitly specify that. |
| Canadian Mind Products |
|
| mindprod.com IP:[65.110.21.43] |
| view Blog | Your face IP:[38.107.179.213] |
| Feedback | You are visitor number
28,379. | |