- ProductsGreat pdf developer solutions
- SupportDeveloper 2 developer support
- ResourcesCommunity & developer resources
- Why PDFTronTrusted pdf experts with great solutions
- About UsThe story behind the company
Complete Unicode support. PDF2Text can process PDF files from any part of the world (including Asian languages) and represent the extracted text using UTF-8 and UTF-16. To improve Unicode output PDF2Text can recognize vendor-specific Unicode character assignments (in the Private Use Area) and map them to public Unicode area. Similarly Unicode ligatures and PDF specific ligatures can be broken into a sequence of individual Unicode characters. Characters that can't be mapped to Unicode are predictably mapped in the Private Use Area.
Intelligent Text Recognition. Intelligent text recognition and logical structure engine used to recognize words, lines, paragraphs, and the reading order in PDF documents. The engine can remove duplicated text commonly used to drop shadows, or text that is obscured by other page content. The text extractor also works flawlessly with PDF documents that contain rotated text or documents where the information is presented in a random order or is scattered across the page.
Highest Reliability and Robustness. PDF2Text was from ground-up designed to be run in high throughput server-based and multi-threaded applications. Regular and rigorous Q&A process that sets high standards for the reliability of all PDFTron products.
Top Performance. Advanced text recognition and content analysis algorithms coupled with low-memory usage and native code efficiency, make PDF2Text the ideal choice for high-traffic servers as well as for interactive applications. For a quick test of the library's PDF processing performance, simply download and run fully functional demo version.