An Introduction to Key-Value Pair Extraction and Automation
DATA-EXTRACTION
|
May 14, 2021
|
5 min
Share this article
An Introduction to Key-Value Pair Extraction and Automation
DATA-EXTRACTION
|
May 14, 2021
|
5 min
Contents
Download Guide
An Introduction to Key-Value Pair Extraction and Automation
An Introduction to Key-Value Pair Extraction and Automation
DATA-EXTRACTION
|
May 14, 2021
|
5 min
Download PDF File
No items found.
An Introduction to Key-Value Pair Extraction and Automation
DATA-EXTRACTION
DATA-EXTRACTION
|
May 14, 2021
|
5 min
An Introduction to Key-Value Pair Extraction and Automation

A key value pair is a data item which is linked to an attribute value. The content is present within the attribute value while the data item is treated as the ‘original key.’ You can think of the key value pair as a title and paragraph, where the key is the topic and the paragraph is the value.

Multiple key value pairs together make up a key value database. The key for data items in these databases are defined as sets of unique identifiers each of which have a unique pairing. The location of the value is identified through the unique identifier in a key-value pair.

Intelligent Machine Learning Models and AI algorithms have made it easy to train and annotate various business documents. Thus, extracting key-value pairs from thousands of documents has become convenient as a result in recent times.

Examples of key-value pairs in different documents from different industries

A key value pair is essentially a set of two data items – a key and a value.

The value corresponds to the key, with the key being marked as the unique identifier.

For example, for Grain Company the Vendor field would be the key, with the value being AB Grain.

Likewise, key-value pairs make up a collection of fields which provide key information about documents. These details are processed and entered into organizational databases for safe recordkeeping.

However, here lies the main challenge.

Extracting data and entering it automatically to online forms for faster processing.

Below is a list of examples of different key-value pairs across different documents and industries.

 1. Invoices

Invoice Sample

Key-value pair fields for invoices would be data items such as:

  • Invoice Number
  • Date
  • Cashier
  • Total Amounts
  • Taxes

2. Survey Forms

Survey forms consist of key value pairs in a question and answer format.

Survey Form Sample

The key would be the main question, with the values being the answer of choice.

If it’s a feedback survey, the values would be custom or entered manually by the user instead of selecting from a list of options.

3. Government Documents

Government documents like passport, driver’s license, Voter’s ID, and social security numbers have sensitive data stored on them in the form of key-value pairs. A classic example would be a passport page where the key-value pair fields would be:

  • Country Code
  • Date of Birth
  • Nationality
  • Passport Number
  • Issuing Authority
  • Gender

Limitations of manual key-value pair extraction

It is possible to manually extract key value pairs but there are limitations. Here is a list:

High Volumes – Taking the time to sit down and go through numerous documents is a tedious task involved in manual key value pair extraction. The enormous volumes of data can overwhelm administrators

Lack of Accuracy – If the person extracting these fields and entering information makes a mistake, it is going to end up organizations losing customer trust

Missing data – There could be fields blurred out, left empty, or information missing from forms as a result of manual entry. Humans make mistakes when they least expect it, especially when going through so many documents

Slower Processing Speeds – Manual key value pair extraction is a slow and time consuming process. Processing speeds are lower when comparing manual key value pair extraction with automated mechanisms

Lack of formatting – For those who are dealing with unstructured data, documents have to be formatted on top of manually extracting the fields. There is a risk of data duplication and redundancy in records as well through manual extraction methods

Automated data extraction technology for key-value pairs

Key value pairs can be extracted days by using a combination of ICR and OCR technology. Methods to automate key-value pair extraction are listed below:-

1. Named-Entity Recognition

Named-entity recognition is a sub-task of information extraction that tries to locate and classify named entities in unstructured text into predefined categories such as person’s name, ID number, address, organization etc. This comes handy in key-value pair extraction in unstructured/semi-structured documents.

Approaches to execute Named-entity recognition:-

1.  Classical Approaches (rule-based)

2.  Machine Learning (ML) Approaches

i) Multi-class classification

ii ) Conditional Random Field (CRF)

3. Deep Learning (DL) Approaches

i) Bidirectional LSTM-CRF

ii) Bidirectional LSTM-CNNs

iii) Bidirectional LSTM-CNNS-CRF

iv) Pre-trained language models (Elmo and BERT)

4. Hybrid Approaches (DL + ML)

2. Object Detection

Fast Region-Based Convolutional Network Method is used for object detection from forms which ensures a 66% precision rate. Object detection techniques are used in computer vision primarily but are being increasingly adopted in automated document extraction. Bounding boxes are drawn around entities and neural networks automatically interpret document layouts.

Text can be extracted from images using intelligent OCR such as locations, addresses, company names, persons, and these details can be organized into structured data.

How to extract KV pairs with Docsumo

1. Visit app.docsumo.com and log in using your user credentials.

Docsumo Login

2. Click on APIs & Services and you will get a list of pre-trained APIs which you can use for extracting key value pairs. For example, if you want to extract information from Driver’s License, you have to select the Driver’s License module and so on.

API and Services - Docsumo

3. Go to Document Types after selecting your pre-trained API. If you don’t have a pre-trained API and need to define a new document type, go to the next step. Click on 'Create New Document Type'. You will get this screen.

Doc Type

4. After the document is uploaded, define new fields that are your key identifiers.

Edit fields

5. Click on “Add field” and specify your data type. You will get different options depending ranging from String, Date, Numbers, etc.

Edit Field 2

6. Select “click to edit” and draw a bounding box around the value you want to capture.

Edit Fields 3

7. Repeat the process for all the key-value pair. After that click on ‘Save and Close’ and decide whether you want these changes to be applied to new documents only or the existing ones as well.

Save and Close

Your custom API is now trained. Now all you have to do is upload your document type to the API and it will automatically extract the KV pairs for you.  You can click on review after the automated data extraction is done for reference

Conclusion

Extracting key value pairs using Docsumo takes users a matter of minutes. With a simple drag and drop approach, users can train custom APIs or select from a list of pre-trained APIs. Be it invoices, shipping documents, bill of lading, property papers, or any government records, Docsumo’s intelligent OCR and automated key value pair extraction offers organizations versatile use-cases for obtaining data from a variety of documents.

Pankaj Tripathi
Hi, I’m Rushabh.
Everyday I speak to people who use our product to automate their workflow. Contact us and we will be happy to see how we can improve your processes.
Contact Us
Share this article on
Stay up to date with Docsumo
This is some text inside of a div block.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Get Exclusive Automation Tips
For the latest news, case studies and actionable tips straight to your inbox.
Thank you. You have been subscribed.
Oops! Something went wrong while submitting the form.

Download PDF File

We’d love to show you how you can increase your productivity, process your documents faster and save operations cost!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Blog

Explore more