PDF (Portable Document Format). Adobe’s
platform-independent format for distributing documents. You can recognize them by the *.pdf extension. You will find them commonly on commercial websites to distribute product literature
or complex technical documentation. PDF allows searching and scrolling, just like HTML (Hypertext Markup Language) in a browser.
Click to download or read a sample PDF document. Requires Acrobat.
Similarities to HTML
PDF is similar to HTML in that:
- Both let you prepare documents for the web, hard disk or CD (Compact Disk).
- You can convert HTML to PDF and PDF to HTML.
- Both support hypertext links.
- Both support embedded images, sound files and movies.
- Both support forms you fill in and submit. PDF calls that FDF (Forms Data Format) FDF.
- Both will display the first part of a document even before the whole document is downloaded
Differences From HTML
PDF is different from HTML in that:
- It is oriented around pages. HTML is oriented around documents with no page breaks. PDF does not have a way
of reflowing page breaks the way you would in PageMaker.
- PDF is not designed to be modified. You modify the original documents, perhaps in MS Word format, then
regenerate the PDF. With HTML you modify the master HTML documents directly.
- PDF looks identical on all machines. HTML adjusts itself to the resolutions and fonts available the viewing
machine.
- PDF requires proprietary to tools to create. HTML can be created with tools as simple as NOTEPAD.
- PDF documents are designed to be printed. HTML documents are not. Most printers cannot understand PDF.
Software must convert the PDF to the native printer language, possibly PostScript. A few printers, such as the
HP (Hewlett Packard) Color LaserJet 4650 series, can understand PDF directly.
- Adobe pulled off a near miracle, persuading the major font companies to allow their fonts to be embedded in
PDF files, without royalties. There is no way to do this with standard HTML, CSS (Cascading Style Sheets) or Java. If you include the
fonts, everyone will see the document with the proper fonts. If you don’t, people without the fonts
installed will see the document with a rough approximation to the correct font using Adobe’s morphing
master font technology. The document will be rendered with whatever fonts the end user has available. They may
look absolutely nothing like the ones you used to compose the document. Only the glyphs you actually use in the
document are included. If you don’t want the bulk, the distiller will save some information about the
font metrics instead, then the viewer’s reader software can warp one of the installed fonts to create a
substitute that will at least have the same spacing even if it looks nothing like the original font.
- PDF documents may include Adobe PostScript Type 1 or Microsoft TrueType fonts. HTML documents normally
never embed fonts. I don’t even know what the format is when they do.
- With HTML, the person viewing an HTML document can override all your stylistic choices. With PDF he will
always view the document exactly as you designed it.
Advantages of PDF
The advantages of PDF format are:
- It is a lot less work to prepare a PDF document if you already have either a word processing document or a
printed page. Preparing the HTML document is often almost as much work as retyping from scratch.
- You can rapidly create documents for the web from printed materials, including graphic images. The raw
materials might be MS Word documents, PostScript files, books, or brochures. You don’t need a lot of
manual keying and touchup the way you would to create the equivalent HTML documents. All you have to do is scan
a page, or print a word processing document to a special printer driver and Adobe Acrobat does the rest. The
conversion program is slow, but requires little effort to touch up the results.
- PDF gives you absolute control over the final look of the document, unlike HTML where you can only give
hints.
- You can control whether the reader is permitted to cut/paste or make hard copy printouts. You can control
whether others are permitted to modify the document. Clever users can bypass these restrictions.
- PDF documents print properly with page breaks in the right places. HTML documents break printed pages half
way through images or even half way through a line of text. HTML printouts are a mess!
- PDF is much more compact than the equivalent set of graphics images. PDF is essentially an OCRed document
in modified PostScript format. Part of the secret of the compactness is that standard fonts need not be
included as part of the document. They come with the viewer. However, PDF is not as compact as HTML. It
contains much more precise font and positioning information.
- PDF documents come with a miniature search engine built into every document.
- PDF documents have automatically generated table of contents, thumbnails and indexes.
- If you buy Acrobat 4, you can add annotations and highlights to documents. You can have many people adding
comments to a document. You can filter annotations by person or date. You can sort them by person, date, type
or page number. Annotations may be comments, stamps (e.g. approved, confidential), highlighting, thumbtacks,
paperclips or scribbles.
- PDF has article threading. If you scan in a magazine, you can leave it exactly the same as the original
layout. The viewer program will automatically guide the reader through the disconnected pieces of an article.
With HTML you must manually cut and paste the various pieces into separate documents, one for each article. You
have to show Acrobat the flow of each article, but this is a tiny fraction the work of cutting and pasting the
fragments into a new document for each article.
- PDF documents can be digitally signed. You can be sure the document is not a forgery and has not been
tampered with or damaged.
- It is easy to take an existing paper form and convert it for electronic use in PDF. With HTML you must
start from scratch and design the form electronically. Designing forms is much simpler in PDF. In HTML, it
requires a programmer. Anyone could do it in Acrobat/PDF. You can also design forms electronically in PDF, or
use any graphics, word processor or publishing software to design them.
- Apple Mac OS (Operating System) X uses the PDF as the basis of its Quartz imaging model.
Disadvantages of PDF
The disadvantages of PDF format are:
- To be charitable, exporting data from PDF in incompetent. Data is scrambled, missing, not properly delimited, reordered…
You can try two ways: save as text and copy paste page by page. They give different results.
- PDF is supported under W95/W98/Me/NT/W2K/XP/W2003/Vista/W7-32/W7-64 and Mac only. HTML is supported on nearly
everything.
- PDF documents are bulkier than HTML documents.
- PDF is oriented around fixed size pages. You turn pages electronically much the way you would pages in a
magazine, or use a hand tool to drag the paper. You can’t scroll the way you are used to with the mouse
wheel or scrollbars, though you can still scroll after a fashion. Usually you navigate by clicking repeatedly,
by using the hand tool, or by following the table of contents. In HTML, the document in one giant page you can
scroll through continuously. I think HTML maps more naturally to the screen.
- Search engines do not index PDF documents. They can’t see inside them. In 2001 February, the Google
search engine started indexing PDF documents. Google is the fastest and most accurate of the search engines, so
the rest should follow suit.
- People who wish to read the documents must install the Acrobat viewer to be able to see the documents. It
is free, but that is still a hassle that will discourage novices from looking at your documents.
- PDF is proprietary to Adobe. You must purchase the Acrobat program before you can prepare PDF documents.
There are now some third party tools for creating PDF documents, and some add ons that work with Acrobat, but
chances are you will need Acrobat itself at the core. There are a number of tools that are optional, but highly
desirable, for preparing PDF documents, some costing over .
- PDF uses only the bulky AIF (Audio Interchange File) and WAV file sound formats. It does not support the much more compact MP3
format.
- PDF supports Apple QuickTime and AVI (Audio Video Interleave) movie formats only. People who view your documents still must
separately install the QuickTime and AVI engines to see the movies. HTML supports dozens of other movie
formats.
- PDF forms are for submitting data only, either to a database, to a CGI (Common Gateway Interface) server or to email. They are not
suitable for doing inquiries the way HTML forms are. I am not 100% sure of this
point. The Adobe representative I talked to did not seem to understand my questions in this area.
- PDF documents embed the fonts needed to view them, even if the user already has them installed. This makes
the PDF files bulkier than they need be, and slows downloading them. Ideally, the font files would be separate
and downloaded only if needed, and then cached so they would not need to be downloaded every time.
PDF Tools
You don’t have to choose. You can prepare your documents in PDF format, then export in HTML, and post both
on your website, reaping the benefits of both. Let your users decide which they prefer to view. Search engines
will bring people to your site who then may choose look at the PDF, especially if they want a printed hard copy.
Entrofocus PitStop is a plug-in for Acrobat that solves many of the small irritations with Adobe Acrobat. It
comes highly recommended.
Linux PDF tools tend to be free.
You can create PDF files using a Adobe Web-Based Conversion Service for
a month. This is a reasonable alternative for low volumes or to experiment.
DocuDesk
converts MS Word and 250 other apps. I suspect it works by acting as a virtual printer driver.
JPedal is a Java library for reading and displaying PDF
files. It has a JWS (Java Web Start) PDF viewer that lets
you view a PDF file on any Java-supported platform. You can extract text or images from PDF files. You can
extract data from FDF forms. There are three versions: There is a stripped down free open source version. The
Enterprise version is
for a single seat, and
for a site licence. You need to negotiate a licence to include it in your distributed software.