Scanning your Old Publications

If your club has been around for a few years, you probably have well over 100 issues of your publication. In a lot of cases, these publication only exist on paper. Either they were created before the desktop publishing revolution in the early 1990's, or perhaps the orginals have been lost.

Whatever the case, your members would love to have access to this information. The best format to use is Adobe's ubiquitous PDF format.

Using the PDF image+text option, you can maintain an exact "photocopy" of the page with an invisible layer of searchable text. Acrobat can do the OCR (Optical Character Recognition) necessary for this format.

Scanner Recommendations

Many scanners can scan to PDF, but only a few are dedicated document scanners. A document scanner is designed to scan both sides of a page at once, at high speed.

One inexpensive scanner I like is the Fujitsu ScanSnap. The ScanSnap device scans fifteen double-sided pages per minute. The input bin can hold 50 pages. Better yet, it is under $400 and comes with a full-version of Adobe Acrobat. That’s everything you need to create compact, searchable PDFs using the Acrobat Image+Text format.

On my desk is a Fujitsu fi-6140 which is a more expensive business scanner. It scans faster and can interface directly to Acrobat which allows me to scan and OCR in one step.

Scan File Size

For best results, scan at 300 dpi, black and white. An 8.5 by 11 image+text PDF should be no more than 35 to 50K per page.

A 20-page pub would be about 1MB.

Scanning Issues and Challenges

My original reason to get this device was to scan in historical issues of the Greater Chicago Cichlid Association’s Cichlid Chatter, my local society’s publication. The club was started way back in 1971 and with six issues per year, that’s over 200 publications.

I’ve faced two challenges in scanning:

  1. Getting back issues. There are a limited number of people who have back-issues. Our club, unfortunately, never archived any.
  2. Scanning 5.5” X 8.5” format pubs

The second topic deserves a bit more explanation. So far, the ScanSnap has done a superb job of scanning in pretty much anything I’ve thrown at it. At some point, the Cichlid Chatter went to 5.5” by 8.5” format. It makes sense—this format is easier to mail and the pocket-sized format is handy.

To scan in these smaller pubs, I simply removed the center staples, fed them in to the ScanSnap and scanned them. They scanned fine, but the problem was that the pages were not longer in reading order.

If you’ve ever been a publication editor, you probably know that pages need to be imposed for the printer. In other words, the pages need to be arranged so that two 5.5X8.5” pages fit on a single 8.5”X 11” piece of paper. So in a 36 page publication, the first two pages contained four actual publication pages—1, 36, 2, 35. Ack!

The following diagram is a visual explanation of Reading Order compared with Print Order.

Print Order vs Reading Order

There are a couple of ways to fix this:

Cut the pages down the middle and scan them manually in order using a custom page size. This isn’t that hard with the ScanSnap as it will patiently wait for the next page to be put in the bin if you set it in “Continue” mode.

Unfortunately, this wasn’t an option as I wasn’t the owner of the publications and cutting them apart was not an option.

Crop the pages and re-arrange in Acrobat. This option is time-consuming and treacherous. In many of the pubs, page numbers aren’t included.

So, in effect, I haven’t solved this issue yet, but I’m working on it! I’ll keep you up to date if I find a quick fix.

by Rick Borstein