"OCR or Optical Character Recognition is the recognition of text from printed or handwritten documents and images in order to distinguish alphanumeric characters using technology."
That's the technical definition.
Let's look at a more practical definition.
What do you see in the below image?
Most likely, you would see the capitalized English character "A". Your mind has already done some preprocessing for you to identify light and dark regions, strokes and other features such as the triangle in the middle surrounded by darker regions.
However, this is what the computer sees when it sees the same image.
A computer simply 'sees' 1s and 0s. It has no cognition of what the patterns of ones and zeros represents to humans. OCR is the technology that converts the pattern of ones and zeros to machine readable data (eg. ASCII, HTML, JSON).
OCR technology helps computers understand printed and handwritten information by converting it to machine readable data.
OCR technology has come a long way since 1990s. Lets take an example. Suppose you're an OCR computer program presented with lots of different letters written in lots of different fonts; how do you pick out all the letter As if they all look slightly different?
You could use a rule like this: If you see two angled lines that meet in a point at the top, in the center, and there's a horizontal line between them about halfway down, that's a letter A.
Apply that rule and you'll recognize most capital letter As, no matter what font they're written in. Instead of recognizing the complete pattern of an A, you're detecting the individual component features (angled lines, crossed lines, or whatever) from which the character is made.
Most modern OCR programs work by feature detection. However, rather than creating specific rules for each letter, they use neural networks for it. How neural networks work is much more complicated and out of scope of this article. In short, neural networks automatically detect features provided that it is trained on a large number of samples of the character it is trying to detect.
OCR software tries to recognize characters in the image /document by slicing the image into smaller pieces and then passing each piece through a neural network to check if it contains a character and to find closest matching character. Modern OCR programs such as Google Vision and Tesseract then combine these characters based on the spacing between them to give word representations.
It is quite likely that you have used OCR technology in your life if you have used an app such as CamScanner to take photos of business cards. When you upload photos & PDF files to Google Drive, Google automatically scans them using OCR technology to identify text in them. Other applications of OCR are:
There are 2 main shortcomings of OCR technology: accuracy and text categorization.
One of the issues with OCR technology is that the accuracy may not be 100%. For example in the image below "21.08.2018" could be captured as "2I.O8.2OI8". Hence, you need a second system that validates the output of the OCR engine.
OCR technology identifies characters and then combines those characters into words. However, for business use, it is important to identify what those words mean. For example, OCR technology will give the output “Invoice No: 12345” where “Invoice No” represents the “invoice_number_key” and “12345” represents “invoice_number_value". This is where you need intelligence built on top of base OCR technology to make the identified text usable.
At Docsumo, we solve both these issues. Docsumo automates data extraction from documents and makes the data actionable. Using advanced computer vision and natural language processing, it validates the extracted data so that it can be directly consumed by downstream software.
In today’s dynamic business world, filing and archiving official documents in the digital form makes it handy, and works wonders in the future or in unforeseen circumstances.
Optical Character Recognition (OCR) is the technology to convert an image of text into machine-readable text. It is the underlying technology for various data extraction solutions including Intelligent Document Processing. However, OCR is not smart enough to figure out the context in a document - it works simply by distinguishing text pixels from the background and finding a pattern. This limitation could cause inaccuracy in captured data that could directly impact the output of your data extraction model.
Accounts payable is a key financial function for any business. Corporations can have thousands of suppliers; even for relatively smaller businesses, the number of suppliers could be in hundreds. All the invoices they receive from these suppliers come in multiple formats, layouts, and templates - some semi-structured, some unstructured. Therefore, firms expend time and resources to capture invoice information through manual data entry and verification of accounts payable. Manual data entry is not feasible in the long run, definitely not on a large scale. Before we talk about how intelligent invoicing solves the problems associated with manual invoicing, let’s discuss the challenges in much detail.
As most of an organization's information is available in an unstructured format, processing it requires an automated system that can handle documents with minimum human interaction. OCR is one such technology, but its scope is limited as it requires human interaction and is highly dependent on the layout and structure of the document to be processed.These limitations are overcome by Intelligent Data Extraction.Using artificial intelligence, the Intelligent Data Extraction technology extracts data from documents and transforms it into useful information through the extraction process. It functions as a singular tool for extracting information from any type of document and aids in optimizing company operations.