In this article, we will dive deep into automated invoice processing, a key back office task that can lead to great deal of time & cost savings if automated correctly. We will look at how invoice scanning and data capture solutions work, different solutions out there in the market and factors to consider when choosing a vendor.

What is invoice capture or invoice processing?

Most businesses these days receive invoices from vendors and suppliers digitally (PDF files, scanned images, photos send over email). In order to make payments, the data from these invoices needs to be extracted and matched against purchase orders (PO based invoices) or checked against received goods (non PO based invoices). Invoice capture is this step of extracting structured data from invoices so that invoices can be automatically processed.  As such, invoice capture is the first back office process to be automated with AI for most companies.

Why invest in an automated invoice scanning software?

An invoice scanning and data capture software automates the mundane task of manual data entry. It  tries to recognise all key value pairs and line items in your invoices and returns easy to handle structured data such in JSON, CSV or XML formats. Once your PDF invoices are converted into structured data, you can easily use the data in your other applications such as accounting and ERP systems. There are several advantages to automating invoice processing for a business:

  1. Reduces back office cost by removing the necessity to hire more accounts payable clerks as the company grows.  
  2. Helps employees focus on higher value activities by eliminating in-house data entry.
  3. Improves accuracy of invoice data extraction
  4. Allows faster processing of invoices which can lead to savings by taking advantage of favourable payment terms.
  5. Helps in audits since bounding boxes are stored by some software which show where in the document data was captured from.

What are the different types of invoice capture solutions?

There are two main kinds of invoice capture software, namely, template based and machine learning based. The key difference approaches is how they extract data from invoices.

  1. Template based software for recurring invoices from limited vendors

For the majority of companies, the number of vendors is limited (less than 500) and 80% of the invoices come from a relatively small set of vendors. When the format of the invoice is known, it is relatively easy to train an OCR solution to extract data.

You only need to process a couple of invoices per vendor for training the software where in the document to extract the data from. Since the format for a particular vendor doesn't change very often, this makes the system highly robust and very accurate.

2. Machine learning based software for varying formats from unknown layouts

In most situations, you will have invoices coming in from a long tail of vendors with varying formats and invoice data. In such cases, it is necessary to use a machine learning based solution that can detect key value pairs and tables from unknown layouts.

If you happen to have a wide variety of vendors, it becomes important to train the software on your dataset. Most invoice data extraction come with a pre-trained model, but you can get much higher accuracy by training on your data set.

3.  Combining templates & machine learning based approaches

Software such as this one (Docsumo) combine best of the both worlds and 'remember' vendor formats without the user specifying so and default to a machine learning based algorithm when a new vendor invoice is detected.

This means that you don't need to create templates for each vendor and the software will create them for you in the background as you start processing invoices. Continuous machine learning based solutions really improve data extraction accuracy within a short period of time once you start using them.  

Who are the top companies that provide invoice scanning / invoice capture solutions?

Below is the list of companies that provide an invoice capture software.

Company Focus Type of Solution
Amazon AWS Textract Document data extraction Pre-trained ML
Coupa B2B spend management Template based
Datamolino Bookkeeping automation Not template based
Docparser Document data extraction Template based
Docucharm Document data extraction Continuously trained ML
Docsumo Document data extraction Template & Continuously trained ML
Hypatos Document data extraction Continuously trained ML
Infrrd Document data extraction
Instabase Document data extraction
Rossum Document data extraction Continuously trained ML
Tabula (open source) Table extraction Template based
Xtracta Document data extraction Continuously trained ML

How accurate are invoice capture software?

Automated invoice data capture is still a problem that has not been fully solved. Since the type of data in invoices (invoice number, taxes, warehouse details, shipping details), the representation of this data ("Invoice No.", "Invoice #", "invoice number" and the format of the invoices varies a lot, computer softwares have a hard time to getting 100% accuracy on data extraction tasks. Though machine learning techniques are evolving rapidly, capturing line items from multiple pages is still challenging.

So how much accuracy can you expect from invoice capture software? In short, it really depends. For really clean and a narrow variety of invoices, you can get between 95% to 99% accuracy. In most practical situations, expect an accuracy between 80% and 95%. The only way to know for sure, is to use one such software and see how it works for your dataset.

A couple of things to consider while measuring accuracy:

  1. Are the invoices text based PDF files or scanned images? You get higher accuracy for text based PDF files since optical character recognition can introduce scanning errors.
  2. Are the invoices scanned using a good scanner? Try to get 300dpi and above resolution for good OCR accuracy. A good scanner also helps to keep the invoices aligned.
  3. Do you need to capture line item details? Capturing line items, especially from multiple pages, adds to the complexity of the solution.

How to choose your invoice capture vendor?

When choosing a vendor check for the following things:

  1. Data privacy policies - Choose a vendor whose data privacy policy is in line with your company policies. More often than not this can be a show stopper if your company policies do not allow the use of external APIs for processing invoices. Also, check with the vendor how long do they store your data. In some cases, your company would need to keep the data for an extended period while in other cases, the data might need to be deleted after processing.
  2. Accuracy of data extraction - As no software is perfect, it is good if you check what is the data extraction accuracy delivered by the software. If you need to process thousands of invoices, it might make sense to do a pilot to check the software before purchasing.
  3. Pricing - Most invoice processing software charge per document processed and a setup fee if you have special integration requirements. You can compare different providers based on pricing if everything else is equal.
  4. How the software learns - Check how the software learns from your invoice data. Best softwares (eg. Docsumo) 'remember' how you extracted data for a particular invoice and also learn using machine learning across all samples.
  5. Ease of use - Since your office staff will be using the software, it is important to check how easy it is to use the software and whether making minor modifications to the extracted data is convenient.
  6. Data entry service - Most software have a human in the loop in case of false positives (eg. wrongly extracting purchase order number as invoice number). Check  if the invoice capture vendor provides a data extraction service in addition to the software. This can lead to a completely automated solution for you, rather than validating the extracted data inhouse.
  7. Integration with other software - Since the invoice data would be consumed by a different software, you can ask the vendor about integration options. Most software such as Docsumo integrate directly using API or provide a CSV/Excel download option.
  8. Software adoption & customer success stories - You can check if the vendor has good reviews online and case studies from other customers in your industry. This can help you understand the company background & help with the vendor selection.

What are the alternatives to automated invoice scanning?

Electronic Data Interchange or EDI specifies standards by which businesses can exchange data. Since the data is exchanged using XML format, it is directly processed by the receiving software without the need of human intervention.

However, this requires that businesses at both ends use the same standard for data exchange. If you have a few really large customers who invoice regularly, you can look into this EDI. In most cases, EDI is not feasible since even if you have a few vendors who send PDF files or paper documents periodically, you will need another system to process those invoices.

Conclusion

As we have seen in this article, automating invoice processing is very much possible provided you are aware of the current technology, are able to define your use case properly and choose the right vendor.

Hope this article gives a good picture of invoice capture software market and helps you make a decision. We at Docsumo have built a document data extraction software just for this purpose. Why not give us a try?