• Download Trial
  • Purchase
  • Contact Us
  • Login

ProductsGreat pdf developer solutions
  •  
  • PDFNet SDK
  • PDF2Image
  • PDF2Text
  • XPSConvert
  • PDF2XPS
  • PDF2SVG
  • PDF PageMaster
  • PDF/A Manager
  • PDFSecure
  • PDF CosEdit
SupportDeveloper 2 developer support
  •  
  • Annual Maintenance Subscription
  • Technical Support & Resources
  • Professional Services
  • Support FAQ
ResourcesCommunity & developer resources
  •  
  • PDFTron Labs
  • Standards
  • PDF & Environment
  • Industry News
  • Newsletter
  • Whitepapers/Datasheets
Why PDFTronTrusted pdf experts with great solutions
  •  
  • Benefits
  • Our Customers
  • Testimonials
About UsThe story behind the company
  • News & Press
  • Contact Us
  • Careers
  • CO-OP &
    Internship Opportunities
  • Partners/Alliances
  • Resellers

Home // Products // Benefits

  • Overview
  • PDFNet SDK
  • PDF2Image
  • PDF2Text
  • Overview
  • Benefits
  • Features
  • Support
  • Download Trial
  • XPSConvert
  • PDF2XPS
  • PDF2SVG
  • PDF PageMaster
  • PDF/A Manager
  • PDFSecure
  • PDF CosEdit
Sub Navigation

Browse by Functionality

PDF2Text

Benefits

Why PDF2Text?

Complete Unicode support. PDF2Text can process PDF files from any part of the world (including Asian languages) and represent the extracted text using UTF-8 and UTF-16. To improve Unicode output PDF2Text can recognize vendor-specific Unicode character assignments (in the Private Use Area) and map them to public Unicode area. Similarly Unicode ligatures and PDF specific ligatures can be broken into a sequence of individual Unicode characters. Characters that can't be mapped to Unicode are predictably mapped in the Private Use Area.

Intelligent Text Recognition. Intelligent text recognition and logical structure engine used to recognize words, lines, paragraphs, and the reading order in PDF documents. The engine can remove duplicated text commonly used to drop shadows, or text that is obscured by other page content. The text extractor also works flawlessly with PDF documents that contain rotated text or documents where the information is presented in a random order or is scattered across the page.

Highest Reliability and Robustness. PDF2Text was from ground-up designed to be run in high throughput server-based and multi-threaded applications. Regular and rigorous Q&A process that sets high standards for the reliability of all PDFTron products.

Top Performance. Advanced text recognition and content analysis algorithms coupled with low-memory usage and native code efficiency, make PDF2Text the ideal choice for high-traffic servers as well as for interactive applications. For a quick test of the library's PDF processing performance, simply download and run fully functional demo version.

Sample Use-Case Scenarios

  • Extract text from a large PDF repository for text indexing or content retrieval purposes (e.g. to implement a PDF search engine).
  • Classify or summarize PDF documents based on their content. Find specific words for content editing purposes (such as splitting pages based on keywords, etc).
  • Convert PDF pages to text or XML for content repurposing.
  • Search PDF pages for specific words or keywords and return their positioning information (e.g. to highlight instances of a given word).
Next Steps:
  • Download Trial
  • Purchase

See Licensing Options

Sub Navigation
  • 2010 PDFTRON SYSTEMS, INC, ALL RIGHTS RESERVED |
  • LEGAL |
  • SITEMAP |
  • CAREERS |
  • CONTACT US