Some test text!

Java PDF Extraction Library

Use Java PDF extraction to access and extract specific content from a document. Easily extract text and images from a document, or even low level content, such as individual graphical elements and decoded byte streams.






Get started

Extract Text from a PDF

To extract text from a PDF document.

Extract embedded fonts from a PDF

To extract embedded fonts from a PDF document.

PDFTron SDK benefits include:

  • Use Java PDF extraction to extract digital signatures (timestamps, etc.)
  • Intuitive page content extraction based on a concept of graphical elements
  • High-quality and efficient text recognition engine (pdftron.PDF.TextExtractor). Use TextExtractor to extract structured Unicode text including style and positioning information from any PDF document. The simple-to-use API has advanced options related to hidden or duplicated text, ligature expansion, etc.
  • Low-level text extraction (including positioning information for text runs and individual characters)
  • Complete access to the graphics state (for color spaces and colorants, dash properties, etc.)
  • Full access to fonts, including glyph outlines
  • Image extraction. All compression filters allowed in PDF are supported and images can be optionally extracted in RAW format
  • Image color-conversion and normalization filters
  • Full access to marked content (e.g. used in tagged PDF documents to preserve logical structure or to mark transparency groups)
  • Full access to page form fields and annotations
  • Extraction of embedded fonts, ICC color profiles, U3D streams, embedded files, etc.
  • Access to a document's metadata
  • High-level Logical Structure API and support for 'Tagged' PDF documents
  • Extract and render PDF layers (also known as Optional Content Groups, or OCGs)

Tools and Utilities


A utility for text extraction from PDF documents.



A tool for text and table data extraction.


Try our SDK for free today

Upcoming Webinar: PDFTron SDK Tech Review | Nov 29, 2022 at 2 pm ET


The Platform


© 2022 PDFTron Systems Inc. All rights reserved.


Terms of Use