Some test text!

menu
Overviewkeyboard_arrow_down

Content extraction

Content extraction provides the ability to access specific content from a document.

PDFTron SDK benefits include:

  • Extract digital signatures (timestamps, etc)
  • Intuitive page content extraction based on a concept of graphical elements
  • High-quality and efficient text recognition engine (pdftron.PDF.TextExtractor). TextExtractor can be used to extract structured Unicode text including style and positioning information from any PDF document. The API is simple to use and has a number of advanced options related to hidden or duplicated text, ligature expansion, etc
  • Low-level text extraction (including positioning information for text runs and individual characters)
  • Complete access to the graphics state (for color spaces and colorants, dash properties, etc)
  • Full access to fonts, including glyph outlines
  • Image extraction. All compression filters allowed in PDF are supported and images can be optionally extracted in RAW format
  • Image color-conversion and normalization filters
  • Full access to marked content (e.g. used in tagged PDF documents to preserve logical structure or to mark transparency groups)
  • Full access to page form fields and annotations
  • Extraction of embedded fonts, ICC color profiles, U3D streams, embedded files, etc
  • Access to a document's metadata
  • High-level Logical Structure API and support for 'Tagged' PDF documents
  • Extract and render PDF layers (also known as Optional Content Groups, or OCGs)

linkGet started

Extract text from a PDF
To extract text from a PDF document.

linkTools & Utilities

PDF2Text
A command-line tool for text extraction from PDF documents.

PDFGenie
A command-line tool for text and table data extraction.

Get the answers you need: Support

close

Free Trial

Get unlimited trial usage of PDFTron SDK to bring accurate, reliable, and fast document processing capabilities to any application or workflow.

Select a platform to get started with your free trial.

Unlimited usage. No email address required.