FTP (File Transfer Protocol). A protocol built on top of TCP/IP (Transmission Control Protocol/Internet Protocol)
that lets you send
or receive a file over the Internet. The url for a file has this form: ftp://username:password@hostname/pub/mydir. FTP works better than HTTP (Hypertext Transfer Protocol)
because it can pick up where it left off if the
connection is broken. HTTP in theory has this feature as well, but it is more frequently implemented in
FTP. If
you simply want to download or copy a file, you can use HTTP which is faster and simpler.
- Normally you use PASV (Passive ftp file transfer) mode, the newer style of FTP
where the client sets up both
the uplink and downlink. This is more secure than the old style when the server set up the downlink. Further,
it is easier to tunnel through firewalls with PASV. You would not use it if you have some very old server
that does not support it. Active connections are also slightly faster.
- Typically FTP gives you access to different
files than HTTP
does, e.g. the logs of
website activity. You can typically both upload and download. HTTP upload is fairly rare and usually set up to handle only one file at a time.
- With FTP you can upload, download, delete, rename, make a directory, get a directory
listing or abort the transfer in progress. In contrast, with HTTP, all you can do in download
a file.
- If you are setting up an FTP server, you will have to tell your firewall/router to forward ports 20 and 21
for FTP
and 2000..2010 for PASV to your FTP server.
- If your FTP session aborts, the server may lock the file you were working until the session
times out. The server can’t tell the session was aborted. Until the timeout expires, you won’t be
able to update or rename that file in a new session.
- Jon Postel and Joyce Reynolds devised the FTP standard,RFC 959, back in 1985. Like all
RFC (Request For Comment) s, it has not been updated since. Mr. Postel has died. Follow on
RFCs (Request For Comments)
include RFC 2228
(security),RFC 2640 (internationalisation), RFC 2773 (encryption) and
RFC 3659
FEAT (ftp Feature enumeration).
- There is now a secure FTP protocol, SFTP (Secure File Transfer Protocol)
covered in RFC 2228
and RFC 4217. There is also FTPS (File Transfer Protocol over Secure Sockets Layer) (FTP
over
SSL (Secure Sockets Layer)), and SFTP (FTP over SSH (Secure Shell)), and FTP
over
TLS (Transport Layer Security) that encrypt the transmissions for privacy.
Gotchas
- There is nothing in the FTP protocol to find out what time the server thinks it is, or what TimeZone it is
using. You can kludge by uploading a tiny dummy file, then do a directory list on it to find out what time the
server stamped it with — which will be not the file’s original time, but the time it arrived at the
server. This tells you what time the server is using accurate to a second or two. Typically
FTP
just stomps uploaded files with the server time they were uploaded. FTP does not preserve the last modified time of the files. FTP software
must adjust for the difference between local time and server time to figure out which files have changed since they were last uploaded.
In a sane world, both FTP client and server would exchanged timestamps in UTC (Coordinated Universal Time/Temps Universel Coordonné).
- When the server returns a directory list, the FTP protocol does not specify which file time to use, client,
server or UTC, daylight saving or standard, current
DST (Daylight Saving Time) or historic DST.
- When you upload a file, the FTP protocol does not specify what time the server should date the files with,
client upload time, server upload time, client last modified time…
- There are a large number of parameters to configure such as: PASV, whether server supports
FEAT (feature detect) whether server supports RESUME, time zone of the server, name conversion to lower case,
port, URL (Uniform Resource Locator), default directory, login in, password, account,
whether you are permitted multiple connections etc. The transfer won’t work if you don’t match the
server capabilities/expectations.
- If you upload a file while someone on the web is downloading it to read, the upload will fail.
With a busy webserver you need to upload the files to a temporary name, then wait for a lull in the traffic to delete
the old file and rename the temporary. The FTP protocol does nothing to help you with this.
- Uploads are not atomic. If for example you upload a file A that links to file B, and file A is uploaded
first there will be a dangling link. Ideally all the uploads should be done, then then once they have all
successfully completed, reveal all the updates and adds at once.
- FTP
uploads a file at a time. If you have many small files, there is considerable time overhead in the handshaking.
Ideally they should be uploaded in compressed large chunks to reduce the handshaking overhead.
There are more FTP packages than you can shake a stick at. Here are a few. To find even more, search
Google for FTP upload automation These usually preposterously overpriced. They are
typically not smart enough to keep track of what they have already uploaded to avoid re-uploading or having to
download directory information. They are not clever enough to bundle the upload into a compressed archive. They
are not smart enough to retry files that are busy being downloaded. They are not smart enough to do atomic
uploads (the uploads are not visible until all have been uploaded). We need a much more efficient, reliable way
to upload files to websites.
[Java FTP] or [Java secure FTP].
FTP is
a bit long in the tooth. It is being replaced by VPN where you just copy
files via an encrypted link to the server as if it were another drive on your desktop.
AutoMate starts at
They refuse to tell you the price until you have evaluated the
product.
Ports, PASV and Firewalls
FTP is an odd protocol in that a session requires two TCP/IP
socket connections. It starts out with a control channel then later adds a data channel. In an Active connection, it works like this:
- The client connects using a random port number on the client to port 21 on
the server to create the control connection.
- The client sends a port command over the control channel containing a random free port to use for the data
connection.
- The server connects back to the client to socket number given in the port command via port 20 on the server.
This third step is fire-wall unfriendly. Firewalls don’t trust servers out on the net connecting in.
They want all connections to be created by the client. So FTP was modified to make it more
firewall-friendly, with PASV connections. In that case, the client sets up both
connections. The only catch is not all servers and clients support it. In that case,you have to tell the firewall
the incoming connection is ok.
If your software is written in Java, and you are running it from a jar, the firewall considers the generic
java.exe to be the agent trying to trying to tunnel through the firewall. On my machine
there are ten different java.exes. Make sure you tell the firewall which one you are
using to run your program. I found it easier to compile my app with Jet so that I had a unique *.exe for the firewall to recognise.
Future FTP
FTP is long in the tooth, so it will likely be completely replaced rather than patchwork upgraded. Here are some things that will
need to be fixed.
- Run the FTP server and the protocol on UTC with millisecond or nanosecond timestamps.
- When you upload a file, preserve its timestamp accurately rather than using the servers current local time.
- Provide a filesystem like-way to access FTP so that the remote files look just like local files.
- Provide batch atomicity so that uploaded files logically appear all at once on the server at the end of the batch.
- It should upload at least two files at time without special request on the part of the client.