Defragger

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

A defragger is a program that rearranges files on hard disk to speed up access. See defragger in the Java & Internet Glossary. You would want the defragger to put files in order by last access date so that the rarely used files are near the center (end) of the disk and the most used ones are near the outer edge (beginning). You want each file in one contiguous lump. Ideally, you want to move frequently used system files like the directories, free space tables and system swap file as well. Within the i-node table (on Linux) or MFT (Master File Table) (on Windows) you want to internally defrag as well, putting the directories together and ordering by last access date, within that by directory. You can seek to optimise either file open time or find-file if you don’t have indexes to rapidly search. Windows does not maintain last-access dates unless you turn them on with the fsutil utility. If you don’t have them, you would have to make do with last-changed dates.

A defragger may also do other tidying such as consolidating free space fragments, removing deleted dummy directory entries, sorting directories and collecting files from the same directory together when their access timestamps are on the same day.

There may also be quick variants of the defragger that just seek to quickly clean up the files with the most fragmentation.

Because undebugged defraggers trash hard disks, you will want to write a defragger simulator that is platform independent. You may want to write platform dependent data-gatherers and ask your friends to send you copies of their messy disk layouts — not the data, just the current file positions and fragmentation so you can test your program on real-world situations.

In later debugging stages, you will want a large hard disk with a free partition you can trash and recreate with a partition copier.

The simulator lets various people try coding the algorithm safely without trashing any disks or getting involved in the low level details of defragging. You can then plug in the most efficient defragger.

Your simulator’s scorekeeper should reward doing I/O in large contiguous lumps. It should reward keeping disk arm motions to a minimum both in number and length of leap. It should, as best as possible, simulate true elapsed time.

I have noticed that most of the I/O of defragging with the OS (Operating System) interface is not moving the file bodies, but rather updating the directory to point to the new location. So I suspect an algorithm that only moved files whose destination is already free will be faster, even if the arms jump all over. You want very hard to avoid moving a file to a temporary resting place.

Imagine the disk arms are like a bus picking up passengers and dropping them off as it sweeps back and forth across the disk moving clusters to their new locations. The clusters being moved by read/write are like passengers. Here are some rules of thumb.

Your algorithm can be adapted to work with any sort of partition. You could produce versions for various types of Unix partitions or even mainframe partitions as well. Your product might then sell for thousands of dollars per copy. You must have a way of locking the entire disk for your exclusive access and there must be a way to do low level physical i/o.

You might use your algorithms in proprietary firmware inside a hard disk to handle defragging/marthaing that was transparent to the operating system, thus creating a seemingly super-fast drive. It would appear to have almost 0 seek time. It might be implemented internally as two separate drives that spend their lives mirroring each other while simultaneously defragging to the other drive. By having two sets of heads, either on the same set of surfaces on separate sets, you would get away with much less head motion. The software can then be as messy as it likes. The hardware will clean up after it. In fact, the irony will be, defragging such a smart hard disk with software will cause the system to temporarily slow down! The hard drive internally is totally unaware of the file and directory structure the operating system is using. It works purely with a giant list of physical clusters. Because everything happens inside the hard disk, there is absolutely no problem with the disk being used simultaneously with being defragged.

What if you are a poor student with no access to such firmware? You can write a simulator that can be used to compute just how much faster such a smart disk would be in real life. Then perhaps you could persuade one of the high end SCSI (Small Computer System Interface) hard disk makers to build a prototype. They might be interested. It is extremely difficult to build a hard disk even a tiny bit faster than your competitors. If you can do it, you can charge a big premium.

You might build a realistic prototype using a PCI (Peripheral Component Interconnect) KVM (Kilobyte Virtual Machine) card that has hard disk support. You then need to write a device driver for your card that makes it look like an ordinary disk drive. The KVM runs your Martha software.

If you had an OS that would let you reserve additional physical space for a file beyond what is written, then you could monitor file growth rate and allocate additional space so that a file would stay in one fragment until the next defrag. I don’t think Windows lets you do this. You could do it in the old IBM (International Business Machines) OS/360. The problem with Windows is every log file instantly fragments again after a defrag.

You could also write a defragger than required a spare empty drive to copy the data to, tidying as it went sorting by last-access-date, then copy back, always in the background. The OS would be tricked into looking for a file on one of the two disks, treating the pair as if it were one logical disk. Even a file in the process of being moved could be simultaneously used by the actual apps, with writes always going to the target.

	This page is posted on the web at:	http://mindprod.com/project/defragger.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\project\defragger.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.216.6]
Feedback	You are visitor number

Defragger

Disclaimer