- Products & ServicesGreat pdf developer solutions
- SupportDeveloper 2 developer support
- ResourcesCommunity & developer resources
PDFGenie is a simple-to-use utility that can extract tables and text from existing PDF documents as HTML or XML.
PDF is a hugely popular format, and for good reason: with a PDF, you can be virtually assured that a document will display and print exactly the same way on different computers.
However, PDF documents suffer from a drawback in that they are usually missing information specifying which content constitutes paragraphs, tables, figures, header/footer info etc. This lack of 'logical structure' information makes it difficult to edit files or to view documents on small screens, or to extract meaningful data from a PDF. In a sense, the content becomes 'trapped'.
PDFGenie is a simple to use command-line tool that can be used to recover tables, text, and reading order from existing PDF.
After you unzip the archive you are ready to go. For example: 'pdfgenie my.pdf' will convert
to the following HTML.
The only limitation of demo version is that certain words in extracted text are replaced with a 'demo' string.