image provider

Hermit Crab Variable Length Record Files


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

hermit crabHermit crab files allow you to efficiently manage variable length records that can grow and shrink. You can access records by an integer relative record number. Hermit crab files do not provide access by key, though you might build an in-RAM index or Hashtable by key fields that gives you the relative record number. They are called hermit crab files because as records grow, they abandon space shells, which are then recycled for use by smaller records.

Here are some thoughts on how an efficient variable length record manager might work. This is a virtual product. There is no such utility. I first proposed this back in 1991 for DOS (Disk Operating System). I have dusted it off a little for Java. It would be useful for that intermediate territory where flat files are not quite smart enough and an SQL (Standard Query Language) database is overkill.

Purpose

The Hermit Crab variable length record manager is a way to efficiently handle variable length records. The strange name comes from the algorithm used. As records (crabs) grow, they cast off their shells and find larger ones to live in. Smaller records growing can then move into one of the cast-off shells.

The Interface

In DOS, Hermit would be a TSR (Terminate and Stay Resident) you load before your application. In Java it would be an ordinary class. It can handle files with records up to 64K bytes long.

You need to design your application to use the Hermit API (Application Programming Interface). Though you can Copy, Move and Delete Hermit files with ordinary operating system commands, you cannot edit them with ordinary word processors.

Hermit allows you to number your variable length records 0, 1, 2, .. 4,294,967,295. You can leave some numbers out, but you should attempt to keep the highest record number used as low as possible. Read the later Under the Hood to find out why.

Advantages

The advantages of hermit crab files are:

Disadvantages

The limitations of hermit crab files are:

Interface Details

File Create

File Open

Random Write

Random Read

Read Next

read Prev

Get Stats

Reorg

Under the Hood

For very large files, you might split the logical file up into several smaller physical files. This way they can span several physical hard disks with a read head dedicated to each. Windows has notoriously slow random access on large files because it chases linear FAT chains to find each cluster. Speed under Windows can thus be improved by keeping physical random access files small.

Of course, if you have lots of RAM, you can keep the entire structure in memory and just pickle it between incarnations. In that case almost none of the Hermit Crab structure is required. You rely on the garbage collection mechanism to find your growing and shrinking records new recycled homes.

What is Wrong With This Scheme

Indexing

You could build external hashing or Btree indexes to this file. They would convert a key to a record number. The design of these indexes would be simplified because they would not need to be updated when a record was moved, only when a key value changed.

You might use a dual index — the base and the changes. On every reorg you merge the changes back into the base. The base index can be a read-only structure which makes it simpler to generate and more efficient to scan. The changes index is small, so you can get away with less sophisticated indexing techniques, perhaps even something as crude as a linear table or hashtable kept in RAM.

With 100 MB RAMs (Random Access Memories) becoming commonplace, you might simply implement all your indexes as RAM-resident hashtables that cough up a record number to hand to Hermit. A purely RAM-resident BTree is a much simpler animal to code that a disk based one. You can use serialization to load/store your RAM-based hashtables or Btrees.

Compression

Records could be compressed and decompressed on write/read. Even with fixed length data, compression creates varying length compressed results.

Quick & Dirty

Perhaps you would use a Hermit Crab file if somebody else coded it for you, but you have not the patience to do it. Here is how you could whip up a quick and dirty one. Use a Hashtable or in-RAM array that tells where each record starts and its precise length in bytes. If a record changes value, but not length, write it back to the same spot where it was before. If it grows or shrinks, even so much as one byte, write a new record for it at the end of the file. When you close the file, copy the live records to a new file (which reclaims space). Then pickle the Hashtable or array in a companion file.

You might even use an external sort such as Opt-Tech Sort to rapidly reorder your records into the proper physical order.


This page is posted
on the web at:

http://mindprod.com/project/hermitcrab.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\hermitcrab.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[18.119.255.94]
You are visitor number