Adobe Acrobat
Appendix B: Data Conversion

Appendix B - Data Conversion

B.1   A Case Study

B.2  PDF versus HTML

B.1  A Case Study

Publishing this report makes a good case study for the process, and the pitfalls that accompany creating a project Web site. We will look at what went into making the report, publishing the report on paper, and creating a Web site based on that report.

At the onset of a research implementation project, the LRRB anticipated needing a printed format to distribute, and a version that can be placed on their Web site. LRRB analyzed their current Web site and concluded that users visit the LRRB Web site mostly to request copies of printed material. Therefore, the Board decided to require consultants to submit all reports in a printed format and in Adobe Acrobat Portable Document Format (PDF).

PDF was chosen as the Web publishing format for the LRRB because it allows them to distribute an exact electronic copy of material, and the technology used to read the files is commonly available. PDF puts less of a strain on consultant budgets, because the conversion and editing phase are shorter.

For this project, an additional copy is being presented in HTML format. HTML allows the report to illustrate some of the mechanics and principals, and provides material for this case study. After all, it is a report about Web guidelines.

See Appendix B.2-PDF vs. HTML

At the end of each of the following sections, we have indicated what percent of the total project each step was. These should be used as a rough guideline to help plan budgets for a Web project. Keep in mind, however, that the consultants on this project have experience with publishing projects on the Web. You should adjust planning accordingly.

Step One – Creating

The final report was drafted, revised and re-written in a word processing application. In this case, we chose Microsoft Word. We have tested conversions with the Office 2000 series of products, and found that Word produced the best HTML code. We take advantage of the style sheets that Word can provide and its ability to convert any graphics placed in the document to a .gif file (one of the core technologies discussed in 2.3.4 Core Standards).

Since we planned to convert this document to an HTML file (actually several HTML files), we were careful to set up style sheets that we have previously experimented with and that produced the best results. We made sure to test the conversion after a draft and made improvements to the style sheets after viewing the results (see 2.5 Planning Web Publishing).

Other word processing applications have other strengths and weaknesses. In addition, as they continue to evolve, we will continue to evaluate the technology.

Creation was 70 percent of the project.

Step Two – Converting

Once all of the editing was done, the word processing document was handed off to two groups. Our desktop publishing group formatted the document for print. Once a print layout was approved, a PDF document was generated. To generate PDF files, it is necessary to have the full version of Adobe Acrobat (as opposed to the free Adobe Acrobat Reader). With the proper software, conversion to PDF is as easy as printing the document to a laser printer.

Desktop publishing is an extra step. PDF files could have been generated from the word processing document just as easily.

While the desktop publishing group was creating a layout, the word processing documents were saved as HTML files. The HTML style sheets (a format called Cascading Style Sheets – CSS) that the word processing package created were edited (simplified) and the conversion process was complete.

Converting was 10 percent of the project.

Step Three – Editing

There are many different HTML editors on the market, ranging from basic text-only editors to WYSIWYG with allow editing of the HTML while viewing a representation of the Web page on the screen. We chose a popular WYSIWYG editor and began editing the HTML file.

The first step was to create several smaller HTML files from the one big one that our word processing package had created. Once that was done, the files had to be linked together. For that, we decided to create some icons to represent the sections. We used a graphics package to create the icons and imported them onto each of the HTML pages that we had created.

Editing was 15 percent of the project

Step Four – Publishing

Publishing in this case was collecting all of the information on a disk and presenting it to the LRRB for inclusion on their Web site. This is not always as simple as it sounds. The files need to be copied in a precise directory structure so that transferring them to a Web server is seamless. In this case, we “burned” the files onto a CD ROM and tested to make sure that all of the links that were created in the editing phase worked as expected. Some knowledge about the Web server’s platform is necessary to make sure there are no conflicts with file formats.

Publishing was 5 percent of the project.

B.2   PDF versus HTML

When to Use HTML

  • Large amounts of text that need to be searchable – HTML text can be searched from the Web browser, or indexed on the Web server. This makes the text HTML pages available to a local search engine. The actual text of an PDF file is not available to a local search engine, nor can it contain any meta-information to help users locate the document (see Meta)
  • Projects that need to link other projects – Both PDF and HTML files use Hyperlinks. It is easier to change and maintain links in an HTML file, and easier to manage long list of links in HTML.
  • Whenever file size is an issue – HTML files are usually more compact and designed to be delivered over the Internet. HTML files generally use less “bandwidth.”
  • When access to the text of the document is important – Text and graphics can be cut and pasted from HTML files.
  • Whenever feasible – HTML files are more universal, flexible and generally preferred for Web publishing.

When to Use PDF Files

  • When the layout and printing are important – PDF files retain all of the formatting, fonts and characteristics of the document from which they were created. A user with free Acrobat Reader software can print an exact copy from their own laser printer. PDF files are an excellent format for reducing reproduction costs.
  • When file size is NOT an issue – PDF files are generally much larger than HTML files.
  • If cost and development time are issues – Generally, PDF files are faster and easier to produce, provided the correct software is available.
  • Access to the text is NOT needed – PDF files are published "read only" and it is difficult to cut text and graphics from them.