"OCR or Optical Character Recognition is the recognition of text from printed or handwritten documents and images in order to distinguish alphanumeric characters using technology."
That's the technical definition.
Let's look at a more practical definition.
What do you see in the below image?
Most likely, you would see the capitalized English character "A". Your mind has already done some preprocessing for you to identify light and dark regions, strokes and other features such as the triangle in the middle surrounded by darker regions.
However, this is what the computer sees when it sees the same image.
A computer simply 'sees' 1s and 0s. It has no cognition of what the patterns of ones and zeros represents to humans. OCR is the technology that converts the pattern of ones and zeros to machine readable data (eg. ASCII, HTML, JSON).
OCR technology helps computers understand printed and handwritten information by converting it to machine readable data.
OCR technology has come a long way since 1990s. Lets take an example. Suppose you're an OCR computer program presented with lots of different letters written in lots of different fonts; how do you pick out all the letter As if they all look slightly different?
You could use a rule like this: If you see two angled lines that meet in a point at the top, in the center, and there's a horizontal line between them about halfway down, that's a letter A.
Apply that rule and you'll recognize most capital letter As, no matter what font they're written in. Instead of recognizing the complete pattern of an A, you're detecting the individual component features (angled lines, crossed lines, or whatever) from which the character is made.
Most modern OCR programs work by feature detection. However, rather than creating specific rules for each letter, they use neural networks for it. How neural networks work is much more complicated and out of scope of this article. In short, neural networks automatically detect features provided that it is trained on a large number of samples of the character it is trying to detect.
OCR software tries to recognize characters in the image /document by slicing the image into smaller pieces and then passing each piece through a neural network to check if it contains a character and to find closest matching character. Modern OCR programs such as Google Vision and Tesseract then combine these characters based on the spacing between them to give word representations.
It is quite likely that you have used OCR technology in your life if you have used an app such as CamScanner to take photos of business cards. When you upload photos & PDF files to Google Drive, Google automatically scans them using OCR technology to identify text in them. Other applications of OCR are:
There are 2 main shortcomings of OCR technology: accuracy and text categorization.
One of the issues with OCR technology is that the accuracy may not be 100%. For example in the image below "21.08.2018" could be captured as "2I.O8.2OI8". Hence, you need a second system that validates the output of the OCR engine.
OCR technology identifies characters and then combines those characters into words. However, for business use, it is important to identify what those words mean. For example, OCR technology will give the output “Invoice No: 12345” where “Invoice No” represents the “invoice_number_key” and “12345” represents “invoice_number_value". This is where you need intelligence built on top of base OCR technology to make the identified text usable.
At Docsumo, we solve both these issues. Docsumo automates data extraction from documents and makes the data actionable. Using advanced computer vision and natural language processing, it validates the extracted data so that it can be directly consumed by downstream software.
In today’s dynamic business world, filing and archiving official documents in the digital form makes it handy, and works wonders in the future or in unforeseen circumstances.
With an automated data extraction solution, loan documents can automatically be processed end-to-end without any human errors and delays. Automation in loan document processing prevents downtimes, eliminates data redundancy, and allows companies to respond faster to client queries. By combining machine learning with deep learning and OCR, companies can eliminate huge costs, derive actionable insights, and streamline loan processing and approvals through efficient data extraction and analysis.
Mortgage lenders receive multiple identity and income verification documents along with different forms from loan applicants in a variety of formats and styles. Traditional OCR solutions fail to extract data from these semi-structured documents and that’s why more and more lenders are adopting intelligent document processing solutions. IDP solutions not only extract data correctly, they are able to validate extracted data against predefined rules in order to improve accuracy.
Intelligent Document Processing is an automation technology that captures information from a myriad of documents and data sources, extract data, and organizes it for further processing. IDP solutions enable businesses to seamlessly integrate with core processes, eliminate manual labour, address challenges faced in reading different document layouts, and meeting legal & compliance requirements. Accurate data is the foundation of every organization, and IDP assists businesses in dealing with the complexity of processing huge volumes of documents, helping them automate manual data entry processes, and move away from traditional semi-automated OCR workflows.