Some test text!


.NET Core OCR Library

Optical Character Recognition (OCR) is the process of taking image based versions of characters and converting them into machine encoded text.

Some popular use cases include:

  • Data entry for business documents, e.g. Cheque, passport, invoice, bank statement and receipt
  • Automatic number plate recognition from a photo
  • Automatic extraction of form documents with text fields
  • Extracting business card information into a contact list
  • More quickly make textual versions of printed documents, e.g. book scanning
  • Make electronic images of printed documents searchable
  • Assistive technology for blind and visually impaired users
  • Making scanned documents searchable by converting them to searchable PDFs

OCR Module

PDFTron SDK requires a separately downloadable OCR Module as a new optional add-on utility in order to use OCR with the SDK. It is currently available on Windows, Linux, macOS.

This can be used in conjunction with the SDK to create searchable and selectable text from images. The OCR engine is based on an open source LSTM neural network from Tesseract 4 and supports 100+ languages provided by Tesseract distribution.

The module takes advantage of pdftron.PDF.Convert.ToPdf internally and accepts multiple image formats, as well as PDFs with only raster images. The result quality depends on image supplied. The ideal image is greyscale with resolution in the vicinity of 300 DPI .

Get started

OCR workflow
In this section, we showcase the potential OCR workflow.

Get the answers you need: Support

UPCOMING WEBINAR: "2021 in review: Top five new & updated features" Dec 9th @ 11am PT