Using the Casio YC-430 as an OCR scanner
The Casio YC-430 projection camera system, with a 10 megapixel resolution is the first system of its kind I have seen that I can recommend for its functionality and price. It was preceded by the YC-400, with a 4 megapixel resolution, and may be succeeded by models with higher resolution. However, I find that 10 megapixel resolution is the lowest that works for the OCR of most books.
During April 12-14, 2008, a unit was subjected to extensive testing to capture images for OCR by the Finereader OCR application. The results were comparable to those which can be obtained with a flatbed scanner at 300 dpi, but with the advantage that this system is more compact and can capture documents that cannot be flattened on a flatbed scanner, such as antiquarian books that could be damaged by doing so.
The YC-430 has software that runs under Windows that enables the user to use the unit in several modes. It can be connected via a USB cable to a computer to view and capture images and store them as JPEG files, and a projection monitor can be connected to the computer to show what the computer sees, or the YC-430 can be connected directly to many projection monitors that accept USB video input.
For OCR the unit was used in Scanner Mode, with the resolution set at the maximum of 10 megapixels, the lights turned on, the White Enhancement set to Enhanced White, and left on auto shutter, which requires the user develop some dexterity, but that is worth doing.
In auto mode, the unit first aligns, focuses, and sets the white balance on the empty base. Then whenever one lays a page to be scanned on the base, and stops moving it for a couple of seconds, the shutter automatically captures the image and saves the file to the hard disk of the computer, in a folder which the user will probably want to change from the default. However, the software demands it be a folder on a hard disk and not an external drive like a USB drive. The dexterity comes in laying the pages flat and holding them so until the shutter clicks. Although the depth of field of the camera is such that the pages don't have to be perfectly flat, one does need to avoid presenting the camera with wavy print lines, as Finereader may have trouble recognizing them if they become too wavy. One also wants to align the lines of text so that one doesn't have to rotate the image through some fraction of 90° to get Finereader to accept it, using some image editor such as The Gimp.
In auto mode the unit keeps the user busy and unable to pause or slow down, and it is easy to mess up a page and have to shoot it over, creating duplicate images that later have to be deleted, or to inadvertently miss a page or two. It is also easy for one's fingers to cover portions of text, so one has to practice a little to get the hang of it. However, it is likely to be useful to maintain productivity of assistants. Projects can be stopped and restarted, but it may be a good idea to protect the user from distractions while scanning. The discipline of auto mode can impel a project faster than it would likely take using a manual flatbed scanner.
The Enhanced White setting applies to the image background and does a fairly good job of yielding a background color of a very light pastel. If one wants to get a better white one can use a image editor color pour tool to pour a background color of white on the areas that should be white.
A way was not found to make small adjustments to the zoom of the camera lens to provide a close fit to the size of the document text area, so one may have to accept a fairly wide border that has to be cropped later, either with the image editor or with the eraser tool after loading into Finereader.
Files are stored with a name in the format YYMMDDTHHMMSS.jpg, so by capturing in page order without skipping any pages one gets files ordered in a way that enables one to load them all into Finereader by selecting Open Image then selecting all the files in the folder with the ctrl-a key. That will load and convert the files in order and assign them page numbers in sequence. The JPEG files tend to be about twice as large as the TIFF files produced by Finereader operating a flatbed scanner, so one needs to use a filesystem with enough space if one is going to do much scanning.
The operator software for the YC-430 does a sufficiently good job of managing the capture and save process that one will probably not need a twain driver, although it would be helpful to have a cropping tool during capture to avoid having to use a mask or do cropping after capture.
The software for the YC-430, called "PJ Camera Software", can be obtained here, and the user manual here. To purchase a unit contact Jeff Phillips at the Projector Superstore.
How to render documents