copy : Java Glossary

* 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)

copy

menu
Introduction	Futures
Case Sensitivity	Links
Under the Hood

Introduction

Computers spend a considerable amount of time copying files, often with a command called COPY built into the script processor. In Linux, the copy command is called cp. COPY uses the CWD (Current Working Directory) as both the default for the source of the files and the target. This is rarely what you want, but that is the tradition and hardly anyone questions it. Take Command also has a copy command. Another common copy command is xcopy, specialised for copying trees of files. Note the command is just plain copy, not copy.exe. It is an internal command built into the command processor, not a separate executable.

Case Sensitivity

Windows is a case-insensitive operating system, but that does not mean you can forget about case. For example, Let us assume you have a file called Abc.txt in C:\temp and a file called aBc.txt in D:\temp and you type copy C:\temp\abC.txt D:\temp. What is the name of the file in D:\temp when you are done?

Abc.txt
aBc.txt
abC.txt
abc.txt
ABC.txt

The answer is (2)

Under the Hood

Nothing could be simpler conceptually than copying a file, but to do it efficiently is surprisingly complex. It is easy and common to write appallingly bad copy code and I have seen a lot of it over the years. I think the problem is modern programmers treat the OS (Operating System) and the hardware like a black box. In the olden days, when I cut my teeth, every programmer had to be intimately aware of just was going on physically inside hard disks and OSes (Operating Systems). With a realistic mental model of what is going on under the hood, you would never dream of writing some of the silly copying code I have seen. If you write your own COPY, or have a COPY function included in some other program, there are several things to consider:

How big a buffer should you use to hold the data in intermediate RAM (Random Access Memory) before you write it back to the target disk? The bigger the buffer the faster. However, there are diminishing returns for bigger buffer size. Further, by grabbing too much RAM, you could slow down other tasks, or even your own program since the buffer lives in virtual RAM not necessarily real RAM. It is a Goldilocks problem. The copy will slow down both if the buffer is too small or too big. It depends very much on what else is going on in the computer, completely unrelated to your copy. A smart copy would monitor performance and dynamically adjust the size of the buffer. For copying large files, I typically use either a 32K or 64K stream buffer. For small files, I read the entire file in one fell swoop, with no stream buffer at all and a precisely file-sized FileInputStream byte buffer. In the olden days, 512 bytes was common, the size of one sector.
If there much chance the copy will fail from lack of room on the target, check if there is sufficient space before you start. Unfortunately, this check can be quite an expensive operation on some operating systems. Further, there is no guarantee some other task won’t allocate most of the free space a millisecond later. You must actually allocate the space ahead of time to reserve it.
Allocate the space for the target all at once, rather than implicitly, cluster by cluster as you go. The resulting target file will more likely be contiguous and the copying process will be faster. You know how big the source file is before you start, so you know how big the target will be too. Even when you modify the file slightly as you go, allocate the approximate amount of space you need, then add to or trim the target file later as needed.
What if the copy fails for some reason in the middle? It is best to copy the file to a temporary and only after it has completed, quickly delete the old target file and rename the temp to the target’s name. A power failure in the middle of the copy will leave you with the old target file intact. Of course, this technique requires extra free space on the target drive than you would need to overwrite the old target. HTMLTidy, does this, but has a habit of failing just after the delete but before the rename. If I detect the trouble, I can repair it manually by doing the rename myself.
Copying between two physical drives is more than twice as fast as copying to another spot on the same physical disk. The arms barely have to move. Because drives have transparent internal caching, you can even get reading and writing going on fully simultaneously. With one physical drive, only one i/o operation can be in progress at a time.
If you have an SSD (Solid State Disk), you can use it as your buffer. Copy the file to it, then to the target. Then your source and target disk arms will barely have to move. The whole trick to copying quickly is reducing arm motions and rotational latency (waiting for data you want to spin round under the read head). That’s why doing I/O in big chunks works. You don’t ask the arms to move as often, or to wait for a particular sector on disk to spin round as often.
Multitasking generally does not buy you anything. If you get several copies going at once all you do is fibrillate the disk heads. Disk hardware can do only one operation at a time.
Your application could just spawn/exec a command processor and feed it a script of COPY commands, trusting whoever wrote the code for the command processor was clever. The problem with that approach is your code will not be portable and the COPY code might have been written by a Microsoft intern.
If the file is compressed just copy across the raw compressed bytes. Don’t waste time decompressing and recompressing. Similarly for encoded files, just copy the raw bytes. Don’t decode and encode, unless you are trying to change the encoding.
Just when you thought you understood all this sufficiently to write a perfect COPY method, consider what happens when other unrelated tasks unexpectedly at any time before or during your copy, lock or attempt to lock either the source or target file for read or write.

Futures

Copying is such a fundamental operation, it should be built into the OS. Why?

Because only the OS knows how abundant RAM is. Only it can allocate a optimally sized buffer.
Because the copies can then be safely hidden from virus scanners which currently needlessly scan every copy, slowing things down.
So that copies can occur in the background, with the optimum number of threads, but appear to apps as if they were instantaneous.
So that copies can be done using hardware assist with very low overhead.
So can retire all the incompetent copying code built into applications.
The copy can be highly optimised and it will apply to every copy done in every app.

buffer
copy/paste
I/O
Take Command

standard footer
	This page is posted on the web at:	http://mindprod.com/jgloss/copy.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\copy.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.216.149]
Feedback	You are visitor number