Extracts text from any PDF document to text or as structured XML.
Offers different Unicode text encoding (UTF-8 and UTF-16) options.
Provides positioning, font, and styling information for every Paragraph, Line, Word, or a Glyph on a page.
Offers options to control the level of detail and the formatting in the output XML.
Offers advanced options to control ligature expansion, hyphen removal, and to remove duplicate text (e.g. which is sometimes used for drop shadow effects).
Allows for text extraction from a clip rectangle or to hide text in specific regions on a page.
Option to remove hidden text or text that is obscured by other page elements (such as images or rectangles).
Supports all versions of PDF format (PDF 1.0 to ISO32000).