« Wireless connection lost on resumption from Standby or HibernationRecording Internet radio sound streams »

Optical Character recognition - OCR

08/01/08

Optical Character recognition - OCR

Permalink 03:11:11 pm by Eugene Gardner, Categories: Articles

I was asked to suggest an OCR program and have looked into what is available. I was not able to find a recent unbiased comparison of available programs, so here is the collection of what I have found.

Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.

Imagine you’ve got a paper document - for example, magazine article, brochure, or PDF contract your partner sent to you by email. Obviously, a scanner is not enough to make this information available for editing, say in Microsoft Word. All a scanner can do is create an image or a snapshot of the document that is nothing more than a collection of black and white or colour dots, known as a raster image. In order to extract and repurpose data from scanned documents, camera images or image-only PDFs, you need an OCR software that would single out letters on the image, put them into words and then - words into sentences, thus enabling you to access and edit the content of the original document.

If you’re using Microsoft Office XP or later, then OCR comes with it. From the Windows Start menu, burrow through Microsoft Office, Microsoft Office Tools, and you should find Microsoft Office Document Scanning. If you don’t, then you need to use ‘Add or Remove Features’ from the Office entry in Windows Control Panel, Add or Remove Programs. It’s refreshingly simple to use. Stick the document in the scanner, run the program, choose from four colour options and hit the Scan button. Once your scanner software has finished, the image will appear in the Document Imaging Window. It will also have been OCR’d, and to see the result you select the ‘Send text to Word’ button off the Tools menu.

However, what is free is usually fairly basic. There are other free OCR programs available such as
GOCR - http://jocr.sourceforge.net/
SimpleOCR - http://www.simpleocr.com/
ocre - http://lem.eui.upm.es/ocre.html
ocrad - http://www.gnu.org/software/ocrad/ocrad.html
tesseract - http://code.google.com/p/tesseract-ocr/
Softi FreeOCR (an enhancement of tesseract) - http://www.softi.co.uk/freeocr.htm
ocropus - http://code.google.com/p/ocropus/

But all these are either hard to use, inaccurate, lack technical support or limited in some other way. So that leaves commercial products.

OmniPage Professional is the gold standard OCR product, but there is not much change from £200 so some people will look for more modest applications.

TextBridge Pro comes from a good stable, and at a little over £40 it is a good value product that can be tightly integrated with Microsoft Office. Available from http://www.nuance.co.uk/textbridge/

Abbyy are another well recognised company that produce FineReader Professional for about £89 or the less well specified but cheaper ScanTo Office at about half the price.

Other commercial products that may be worthy of investigation include PrimeOCR and Readiris Pro and CuneiForm OCR

Trackback address for this post

Trackback URL (right click and copy shortcut/link location)

1 comment

Comment from: OCR Researcher [Visitor] Email
OCR ResearcherHey Eugene,

I agree Tesseract OCR and the other open source programs are difficult to install and use, but for Tesseract there is an easier way to use it. You can upload your document to this document management site and then there is an OCR button that will OCR it using Tesseract OCR: www.abillionbillion.com.
09/01/08 @ 17:32
Click here to return to the 1ComputerCare home page.

This is designed to supersede the newsletters that I just don't have time to produce to the standard I would want any more. Please register so that you may read and leave comments and subscribe to have posts automatically e-mailed to you.

Comments and suggestions are always welcome.

Search

September 2014
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Contents

XML Feeds

free open source blog