Some test text!

Converting documentskeyboard_arrow_down

Converting documents

In this document
chevron_rightConverting documents to PDF
chevron_rightThe ToPdf method
chevron_rightIf you have MS Office installed, we have a short-cut
chevron_rightConverting from PDF
chevron_rightPDFGenie
chevron_rightSamples
chevron_rightSpecial cases

linkConverting documents to PDF

linkThe ToPdf method

The pdftron.PDF.Convert.ToPdf method (ToPdf for short, and relatedly, ToXod or ToXps), converts multiple file formats to the requested file output format. It supports several input formats like docx, xlsx, rtf, txt, html, pub, etc. In order to convert these file formats the ToPdf method uses the PDFNet printer to request a print job, and the print output (in XPS) will be used for the actual conversion process.

It is important to keep in mind that in order to use the PDFNet printer, there must be an application installed on the machine, which will send print jobs to the PDFNet printer. The PDFNet printer works like any other printer; in order to print a document, an application must send printing jobs to the printer first, the PDFNet printer cannot open documents by itself.

For example: Printing a text file requires a text file viewer to perform a printing task. In Windows, this is normally notepad.exe.

ToPdf automates this process. It issues the print command on a document, and expects that an installed application will process the command. If the application does not support printing of the opened document, then the Convert.ToPdf method will fail.

Note: It is important to understand that in a 64-bit operating system, the 64-bit PDFNet printer driver must be installed. Installing the 32-bit PDFNet printer in a 64-bit operating system will not work. To install the 64-bit PDFNet printer driver, use the 64-bit version of PDFNet.dll and invoke the following method in a .NET application:

if (!pdftron.PDF.Convert.Printer.IsInstalled())
{
    pdftron.PDF.Convert.Printer.Install();
}

linkIf you have MS Office installed, we have a short-cut

If Microsoft Office 2007 SP2 or later is installed, ToPdf method will take advantage of Microsoft Office’s OLE interop automation library to convert Microsoft Office documents to PDF or XPS formats. Using Microsoft Office guarantees high quality PDF or XPS output files.

The PDFNet printer will not be used when a document can be converted using the available Microsoft Office’s interop libraries. However; the converter will use the PDFNet printer if an earlier version of Microsoft Office is installed - any versions prior to 2007 SP2. This is because the SaveAsPDFandXPS extension is not available in the older versions of Microsoft Office.

linkConverting from PDF

The PDFNet SDK also supports converting from PDF to other formats like EMF, EPUB, XOD, HTML and XPS. In addition to the document formats, exporting to image formats like TIFF, SVG, PNG and JPEG are supported too. Keeping high level structures in a document intact, when converting from PDF to other document formats is of primary importance. This may seem trivial, however there are many instances when structure extraction from a PDF document is not completely satisfactory. As part of our efforts at PDFTron, to do this better we have created a tool called PDFGenie - a command-line utility for extracting tables and text from existing PDF documents as HTML or XML.

linkPDFGenie

One of the more difficult things to do with a PDF document, is extracting tabular data. PDFGenie can extract tables, text, and reading order from existing PDF documents in the form of HTML or XML output. Please see our detailed blog post to know more about PDFGenie. You can also grab a copy of PDFGenie from our downloads section.

linkSamples

The Convert sample is a good starting point to see how PDFNet can be used to convert between formats.

linkSpecial cases

Converting documents using PDFTron, inside a Windows service, or an ASP.NET application requires some changes to the default settings that are used. Please read further to understand this better.