« Wireless connection lost on resumption from Standby or Hibernation | Recording Internet radio sound streams » |
I was asked to suggest an OCR program and have looked into what is available. I was not able to find a recent unbiased comparison of available programs, so here is the collection of what I have found.
Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.
Imagine you?ve got a paper document - for example, magazine article, brochure, or PDF contract your partner sent to you by email. Obviously, a scanner is not enough to make this information available for editing, say in Microsoft Word. All a scanner can do is create an image or a snapshot of the document that is nothing more than a collection of black and white or colour dots, known as a raster image. In order to extract and repurpose data from scanned documents, camera images or image-only PDFs, you need an OCR software that would single out letters on the image, put them into words and then - words into sentences, thus enabling you to access and edit the content of the original document.
If you?re using Microsoft Office XP or later, then OCR comes with it. From the Windows Start menu, burrow through Microsoft Office, Microsoft Office Tools, and you should find Microsoft Office Document Scanning. If you don?t, then you need to use ?Add or Remove Features? from the Office entry in Windows Control Panel, Add or Remove Programs. It?s refreshingly simple to use. Stick the document in the scanner, run the program, choose from four colour options and hit the Scan button. Once your scanner software has finished, the image will appear in the Document Imaging Window. It will also have been OCR?d, and to see the result you select the ?Send text to Word? button off the Tools menu.
However, what is free is usually fairly basic. There are other free OCR programs available such as
GOCR - http://jocr.sourceforge.net/
SimpleOCR - http://www.simpleocr.com/
ocre - http://lem.eui.upm.es/ocre.html
ocrad - http://www.gnu.org/software/ocrad/ocrad.html
tesseract - http://code.google.com/p/tesseract-ocr/
Softi FreeOCR (an enhancement of tesseract) - http://www.softi.co.uk/freeocr.htm
ocropus - http://code.google.com/p/ocropus/
But all these are either hard to use, inaccurate, lack technical support or limited in some other way. So that leaves commercial products.
OmniPage Professional is the gold standard OCR product, but there is not much change from £200 so some people will look for more modest applications.
TextBridge Pro comes from a good stable, and at a little over £40 it is a good value product that can be tightly integrated with Microsoft Office. Available from http://www.nuance.co.uk/textbridge/
Abbyy are another well recognised company that produce FineReader Professional for about £89 or the less well specified but cheaper ScanTo Office at about half the price.
Other commercial products that may be worthy of investigation include PrimeOCR and Readiris Pro and CuneiForm OCR
Trackback URL (right click and copy shortcut/link location)