Jun 27, 2022
3 min read

Optical Character Recognition (OCR): How to Easily Scan ID Documents

What is OCR and how can businesses use it to speed up their KYC routine?

When it comes to Know Your Customer (KYC) checks, businesses process many identity documents manually or by using solutions with limited automation. This entails inaccuracies in customer identity data and slower user flows. Optical Character Recognition (OCR) technology can help.
In this article, we’ll dive into how OCR works and how it can help digitize customer experiences in banking, fintech, and other industries.

Processing ID documents: what are the challenges?

Processing ID documents can be challenging for businesses, especially if they operate globally. During onboarding, businesses not only have to validate documents, but extract identity information for further analysis. Processing this data can be burdensome due to several reasons:

  • ID documents aren’t standardized across countries. ID documents vary greatly across different countries. Below you’ll find national identity cards from Switzerland, Singapore, Mexico, and South Africa, which all have completely different formats. The Swiss card has fields translated in five languages, Singapore’s indicates race, Mexico’s indicates home address, while South Africa’s states residence status.

Images by Sumsub design team

  • Identity documents have different languages and alphabets. Documents are issued in hundreds of different languages in a number of alphabets including Cyrillic, Latin, Greek, Georgian, Armenian—not to mention hieroglyphs and syllabary, as in Japanese, Korean, Chinese, and Arabic. Here’s an example of a driver license issued in China:
Chinese Driver License

Image by Sumsub design team

How OCR helps with the KYC process

OCR allows businesses to scan and recognize identity documents, especially if it’s powered by AI algorithms. Namely, it can recognize complex ID documents regardless of differences in their formats and structures.

How OCR actually works

OCR scans ID documents and extracts user data in a format that optimized for analysis and electronic storage. This is done by:

  1. Scanning an image of the document via mobile or web camera;
  2. Extracting the necessary information;
  3. Converting this information into a machine-readable format;
  4. Uploading the data for storage and further analysis.

OCR can translate texts containing non-latin characters into Latin during the scanning process. This standardizes data uploaded to the dashboard and prepares it for further analysis.

What ID documents can OCR check?

OCR can extract data from any type of identity document, including passports, ID cards, driving licenses, residence permits and others. Here are some common data points that are read:

  • Document number;
  • Name;
  • Nationality;
  • Date of birth;
  • Gender;
  • Date of location of issue;
  • Expiry date.

Example of OCR data extraction from an ID card

Scanning MRZs

OCR can read MRZs—a codified element that consists of lines of characters, numbers, and separators located at the bottom of passports. The information extracted from MRZs is then compared with data that’s visible on the document to verify the customer’s identity.

The benefits of using OCR in KYC

By using OCR in their KYC routine, businesses can reduce costs, save time, and improve the user experience. This includes:

  • Accelerated customer onboarding processes;
  • Improved data accuracy;
  • Less time spent on data input and verification;
  • Scalable workflows;
  • Reduced costs;

Using OCR to extract data from ID documents eliminates human error and other safety risks, contributing to proper AML/KYC compliance.

Limitations of OCR in KYC

AlthoughOCR is a cost-effective addition to the KYC process, the technology still has its limitations:

  • Data privacy. Some OCR services use cloud-based storage systems, which can violate General Data Protection Regulation (GDPR) and other data privacy regulations.
  • Capturing conditions. Camera angles, distortions, or lighting can affect the quality of OCR output.

When choosing an OCR service provider, businesses should also pay attention to the compliance of these instruments.

Choosing the right OCR service for KYC

Businesses can consider five criteria for proper OCR:

  1. Mobile/web camera integration. Different types of SDK for ID scanning can reduce drop-offs during customer onboarding and increase conversion rates.
  2. Technology suitable for any shooting / scanning conditions. AI-based OCR can perform properly in natural shooting conditions with poor lighting.
  3. Number of supported alphabets. OCR should be trained to work with different alphabets, including Cyrillic, Latin, Greek, Georgian, Armenian, Japanese, Korean, and Chinese.
  4. Compliance with GDPR and other data privacy regulations. OCR should provide means of storing data securely, in compliance with GDPR, CPRA and CCPA, and other data privacy regulations.

To fully benefit from OCR technology, businesses should integrate it with their AML and data protection compliance programs.

Save time, lower costs, reduce manual work, and eliminate friction both for clients and your team with Sumsub’s highly accurate optical character recognition (OCR).

AIAMLAutomationFinancial InstitutionsKYCUser Experience