Optical Character Recognition (OCR) has made great progress in the fight for paperless offices. It’s become a staple component in just about any document management software.
So what is OCR? Wikipedia offers this definition: “…the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.” (2008)
Fundamentally, a computer reads the document and creates a library of searchable information. This type of application allows an EDM solution the opportunity to build a database of text, making the search for usable information within and across documents much easier.
While many argue the accuracy levels for OCR engines can reach 98 or 99 percent, small-to-medium businesses (SMBs) may find this hard to achieve with most commercially-available software. Many variables can affect the accuracy levels of output, ranging from document condition to readability.
Where problems can begin to occur is when OCR is not applied to the text contained within the scanned document, but used to lift index values themselves (e.g. customer name, number, etc.). This becomes dangerous if there are no quality assurances or stop-loss measures in place. If that is the case, it becomes likely a document will be misplaced due to a character being off here or there. continue reading...