"OCR or Optical Character Recognition is the recognition of text from printed or handwritten documents and images in order to distinguish alphanumeric characters using technology."
That's the technical definition.
Let's look at a more practical definition.
What do you see in the below image?
Most likely, you would see the capitalized English character "A". Your mind has already done some preprocessing for you to identify light and dark regions, strokes and other features such as the triangle in the middle surrounded by darker regions.
However, this is what the computer sees when it sees the same image.
A computer simply 'sees' 1s and 0s. It has no cognition of what the patterns of ones and zeros represents to humans. OCR is the technology that converts the pattern of ones and zeros to machine readable data (eg. ASCII, HTML, JSON).
OCR technology helps computers understand printed and handwritten information by converting it to machine readable data.
OCR technology has come a long way since 1990s. Lets take an example. Suppose you're an OCR computer program presented with lots of different letters written in lots of different fonts; how do you pick out all the letter As if they all look slightly different?
You could use a rule like this: If you see two angled lines that meet in a point at the top, in the center, and there's a horizontal line between them about halfway down, that's a letter A.
Apply that rule and you'll recognize most capital letter As, no matter what font they're written in. Instead of recognizing the complete pattern of an A, you're detecting the individual component features (angled lines, crossed lines, or whatever) from which the character is made.
Most modern OCR programs work by feature detection. However, rather than creating specific rules for each letter, they use neural networks for it. How neural networks work is much more complicated and out of scope of this article. In short, neural networks automatically detect features provided that it is trained on a large number of samples of the character it is trying to detect.
OCR software tries to recognize characters in the image /document by slicing the image into smaller pieces and then passing each piece through a neural network to check if it contains a character and to find closest matching character. Modern OCR programs such as Google Vision and Tesseract then combine these characters based on the spacing between them to give word representations.
It is quite likely that you have used OCR technology in your life if you have used an app such as CamScanner to take photos of business cards. When you upload photos & PDF files to Google Drive, Google automatically scans them using OCR technology to identify text in them. Other applications of OCR are:
There are 2 main shortcomings of OCR technology: accuracy and text categorization.
One of the issues with OCR technology is that the accuracy may not be 100%. For example in the image below "21.08.2018" could be captured as "2I.O8.2OI8". Hence, you need a second system that validates the output of the OCR engine.
OCR technology identifies characters and then combines those characters into words. However, for business use, it is important to identify what those words mean. For example, OCR technology will give the output “Invoice No: 12345” where “Invoice No” represents the “invoice_number_key” and “12345” represents “invoice_number_value". This is where you need intelligence built on top of base OCR technology to make the identified text usable.
At Docsumo, we solve both these issues. Docsumo automates data extraction from documents and makes the data actionable. Using advanced computer vision and natural language processing, it validates the extracted data so that it can be directly consumed by downstream software.
In today’s dynamic business world, filing and archiving official documents in the digital form makes it handy, and works wonders in the future or in unforeseen circumstances.
Processing mortgage loans requires tons of paperwork, followed by a lengthy waiting period for document verification, resulting in a tiresome customer experience. Automation, specifically RPA (robotic process automation), helps you perform these routine tasks more efficiently so underwriters spend more time doing what’s essential. With RPA, enterprises can reduce heavy expenses, fight against fraud and improve customer experience.
Businesses have to process a plethora of digitally typed, printed, or handwritten papers. To deal with it, businesses require efficient and flexible automated document processing solutions that produce accurate results - this is where Intelligent Document Processing can help your business. An IDP solution incorporates the powerful features of Artificial Intelligence and Machine Learning technologies to automate the tasks that once required human intervention, thereby making document processing scalable, robust, and credible.
RPA (robotic process automation) is like a sword that slices through tedious and repetitive tasks in high volumes for your company. Except, it's not a sword - it's tiny robots performing repetitive and routine tasks so that you can focus on core business functionalities. So, whether you're looking to automate the financial auditing statement or you wish to speed up tasks like account receivable and payable, RPA is one of the easiest ways to go about it. You can utilize RPA for plenty of purposes.