Making and unmaking PDFs

Online Currents 2004 – 19(8) 25-27


When the World Wide Web appeared in the early 1990’s, many users, myself included, assumed that most people would soon do their reading on-screen and that printed documents would slowly become a thing of the past. We soon learnt how wrong we were. Despite continuing improvements in screen displays, many users continue to print documents for marking up, for reference, and simply for reading. It is in order to serve this group that Adobe developed their Portable Document Format (PDF) in 1993. Since then PDF has gone through several versions, but it retains the same fundamental goal: to provide a way of electronically distributing documents that look the same and print the same.

Why is there a need for this? At the time PDF was being developed, there were two different word processing programs in wide use, each with their own formats. Although files from WordPerfect could be opened in and printed from Word and vice versa, there was no guarantee that the printed version would look the same as the original. The situation with desktop publishing and graphics was – and remains – even worse, with each major DTP program unable to open, let alone print, files from any of the others. PDF provides a way of making a ‘virtual printout’; by distributing PDF files a content provider can ensure that regardless of what other software they have on their system, all recipients will be able to view and print out documents with the same pagination and layout. In the past PDF has also been a read-only format, ensuring also that files can’t be inadvertently altered while passing along a chain of users.

PDF creation works by diverting and capturing the stream of information that would normally pass to the printer. Thus any software that can produce printed output can produce a PDF file, simply by passing that output to a PDF ‘pseudoprinter’. The best-known of these ‘pseudoprinters’ is Adobe’s own Acrobat software, but because PDF is an open format there are many others available, and I will review some of these below.

Reading a PDF file also requires special software. Here there is no competition, largely because Adobe have made their Adobe Reader program (previously Adobe Acrobat Reader) available free on the web; it also comes on many installation CDs for other software, so that users can read the PDF manuals, and it is regularly included in shareware distribution CDs from computer magazines. It was recently estimated that 95% of all computer users have Adobe Reader installed, making it the most widely distributed piece of software on the planet. Both Acrobat and the Reader have gone through several versions, adding features; I will briefly examine Adobe Reader 6, the latest version, below.

Finally, some PDF files can now be dismantled into HTML components using a simple command-line program, making it possible to change a document distributed as a PDF. I will examine this at the end of the article.

PDF Creation programs

1. Adobe Acrobat

Adobe’s own Acrobat program is available from most software suppliers: current prices for version 6.0 (June 2004) are $465 for a new program and $135 for an academic version. A free 30-day demo version is available from Adobe’s website (http://www.adobe.com/products/acrobatpro/tryout.html) but at 200Mb this is strictly for broadband subscribers. Installation is fairly quick and adds an additional entry to the Printers shown in the Control Panel. It also adds a small but obtrusive three-button toolbar and a new drop-down menu to each of the user’s Office applications; there appears to be no way to prevent this. Even for the demo version, all PDF files on the user’s system are associated with Acrobat rather than the Reader, making it  awkward afterwards to open a file just for reading. An accompanying program, Acrobat Distiller, can be used to produce PDF from PostScript files, as produced by many graphics programs.

Acrobat has two modes of operation; firstly it controls how print output from the user’s other programs can be captured to a PDF file; secondly it allows the resulting PDF files to be linked, annotated, and combined. Even Acrobat cannot, however, directly edit the contents of a PDF file.

Illustration: Adobe Acrobat screen

Someone preparing a document in Word, Access, PowerPoint or Excel can use their new Acrobat toolbar or menu to convert the document to Adobe PDF (other options are converting plus emailing or converting plus sending for review – see below on this). Users in programs which don’t display the new toolbar can still run Acrobat by using the Print command and setting the printer name to ‘Adobe PDF’. After pseudoprinting in this way the user is given the option of opening the main Acrobat program for further work on their PDF file. Changes that can be made include:

  • Making the file smaller by compressing images and removing compatibility with earlier versions of Acrobat
  • Adding comments or ‘stamps’ to the page (e.g. ‘Sign Here’)
  • Highlighting, crossing out or underlining blocks of the text in a variety of colours
  • Creating boxes that act as hyperlinks to other files, webpages, or other parts of the document
  • Adding passwords barring unauthorised users from opening or printing the document
  • Inserting other PDF files into this one
  • Adding headers and footers
  • Adding a watermark or background based on an existing PDF file
  • Inserting ‘attachments’ of other non-PDF files
  • Adding bookmarks linking to specific pages
  • Controlling the type of transition between pages and set the file to open in a full-screen view, allowing PDF files to function as ‘slide-shows’
  • Converting blocks of text into ‘articles’, allowing them to reflow as the size of the viewing window changes.

On the whole I was pleased with Acrobat’s capabilities, although it seems strange to have such an elaborate system for collecting comments and edits without having any direct way to incorporate them into the document. In this respect Acrobat offers very little advantage over circulating the document on paper. More impressive is the fact that hyperlinks in Word or HTML documents are retained when the document is converted to PDF, making it possible to carry across a Word table of contents or a web index intact into its new form. The options for minimising file size have also made it much more efficient to download and use PDF files on the web and on intranets. But the hefty price tag means that most Acrobat users will be corporate professionals or academics.

2. PDFCreator

PDFCreator from SourceForge is a free PDF production system available as a download from http://sourceforge.net/projects/pdfcreator. A slight tendency towards Linux jargon betrays its open source origins. The current version is 0.8 and the file size is a rather large 8245kb. Downloading and installation is fairly simple. Satisfied users are invited to make donations to the authors.

PDFCreator installs as a printer driver and can be selected from the drop-down list of printers in the Print dialog box for any application. After sending the document to be printed the user is presented with a dialog box where they can supply a title, creation date and other metadata for the document: selecting ‘Options’ gives a range of settings including compression levels and whether fonts are to be included. Finally the user specifies a file name and location. The program does a reasonable job on straightforward Word documents but fails to recognise and convert hyperlinks. A complex 6-page newsletter from MS-Publisher was also converted without any obvious losses, the resulting file being 300Kb in size against the original’s 525Kb. This free program is fine for occasional use by authors who can live without hyperlinks in their PDFs.

3. pdfFactory

pdfFactory from FinePrint Software (http://www.pdffactory.com) sits about halfway between Acrobat and PDFCreator in functionality and price. PDFFactory Version 2.24 is priced at $AU80 and a Pro version at $AU162, with discounts for multiple purchases. Trial versions (about 2.10 Mb) of each program can be downloaded. The Australian reseller is listed as Avalanche (http://www.avalanche.com.au), but attempts to reach their website resulted in the browser being timed out.

Again, pdfFactory acts as a pseudoprinter, intercepting output from other programs, but unlike PDFCreator it intercepts and previews the document in its own window, allowing the user to review and change settings before the PDF output file is finalised. Users are given the option to retain active hyperlinks in the PDF file, and can format the appearance of these in various ways. Both the simple Word document and the Publisher file were converted successfully, with the conversion appearing to take place more rapidly than with the other programs. Although there were no user-adjustable compression options in the basic pdfFactory program, the default settings did a good job, reducing the size of the Publisher file to 156Kb.

Illustration: Preview screen in pdfFactory

4. Amyuni PDF Converter

A similar product, this time from the US (http://www.amyuni.com), of a similar size (2.7Mb) at a comparable price ($US79 or $US99 for a Professional version). But here the settings are hidden away in the printer properties and must be set before the document is converted, making it impossible for the user to change their mind. This, for me, puts it in the second rank behind programs like Acrobat and pdfFactory which utilise a preview mode where the user can review their changes.

5. pdf995

A small, cheap and simple product, pdf995 is available from http://www.pdf995.comfor free as ‘adware’ – i.e. with Web advertising attached – but can be upgraded to remove these for $US10. With a download size of 1.2Mb it is relatively quick to obtain and install. A PDF editor – pdfEdit995 – which has many of the features of Acrobat is available on the same terms. The basic program gave no support for Word hyperlinks, but pdfEdit995 allowed for the conversion of these. Unfortunately pdfEdit995 has no built-in preview mode, so to review their changes the user must shuffle back and forth between Adobe Reader and pdfEdit995 in an awkward two-step, with the further complication of consulting help files obtained via the Web. Conversion was also slower than with the other programs, and the file size for the Publisher document was relatively large at 321Kb. I can’t recommend this system although it still represents excellent value for money.

PDF Reading programs

The most prominent of these is the free and widely used Adobe Reader, formerly Adobe Acrobat, now in Version 6.0. Some of the companies listed above, notably Amyuni, also provide their own PDF readers, but these are not widely used. Reader 6.0 itself is a hybrid of two earlier programs: Adobe Acrobat Reader, which survived to Version 5.0 before being renamed, and Adobe eBook Reader, a less-than-successful attempt to reach the ebook market. Reader 6.0 functions both as a stand-alone program and as a plug-in to web browsers, allowing downloaded PDF files to be displayed without leaving the browser window.

Adobe Reader is a big program, and relatively slow to load. The left side of the screen can be used to display a panel showing bookmarks, thumbnail views of the pages, or other navigation tools. The File menu allows the user to print the document or convert it to text (assuming the author has set permissions for users to do so), and to make minor changes to the metadata in the document properties. The user can select portions of the document for copying and pasting elsewhere, and can search for specific text. A full-screen view allows for the largest possible display and a set of ebook manipulation tools have been brought across from the old eBook Reader. The user can zoom in to 64 times the printed size, should such a thing be necessary, or out to one-twelth size. One new feature of the Reader is that articles can be re-wrapped to fit a changing screen size – but only if the author has chosen to implement this; re-wrapping is not automatic. This is a boon to handheld users, who until now have only been able to read PDFs by scrolling from side to side and top to bottom. The Help system is fairly comprehensive and includes a How To… section.

For visually impaired users Reader 6.0 includes a ‘read aloud’ option which produces mechanical but intelligible spoken output from simple text documents. (Confusingly, this is under the ‘View’ menu). Authors wanting more control over the spoken output can add hidden tags to the document which the software will interpret.

Apart from the time taken to load, Adobe Reader is an efficient and easy-to-use program. Versions are available for Windows and the Mac, as well as handheld systems including the Palm and Pocket PC ranges.

PDFtoHTML

I have mentioned the option in Acrobat to convert a PDF file to text; this is relatively new and may have been prompted by the recent development of third-party software which can convert a PDF file into HTML component files. This is PDFtoHTML, also available from SourceForge (http://pdftohtml.sourceforge.net). Although an author can still block this process by adding security to the PDF file, it does permit users who need to work on the contents of unprotected PDF files to extract both text and images, retaining some basic formatting, without making a tedious series of copies and pastes. PDFtoHTML is a DOS-based command-line program, but a Windows GUI interface is available from http://guiguy.wminds.com/downloads/pdf2htmlgui.