Some test text!
PDFTron PDF2Text is a command-line application designed to convert PDF documents to text or XML. This section covers the basic usage of PDF2Text explaining all of the available options.
The basic command-line syntax is:
pdf2text [options] file1 file2 folder1 file3 ...
See more options in Command-Line Summary for PDF2Text
pdf2text -o ex1 test/importantdoc.pdf
pdf2text --output ex2 -a 3-10 -f xml --xml_output_styles --noligatures --remove_hidden_text test/impotantdoc.pdf
pdf2text -f textruns -o ex3 --c 0,0,595,842 test/blue_secret.pdf
PDF2Text supports processing of multiple input documents in the same run. For example, it is possible to specify multiple PDF folders and PDF2Text will automatically process all PDF documents matching a given file extension. For example, the following command-line will process all PDF documents in folders 'test1' and 'test2'
c:\>pdf2text -o c:/output_folder c:/test1 c:/test2
Wildcard characters can also be used to process multiple input files.
For example, if a directory contains the following PDF documents:
C:\test1 >dir Directory of C:\test1 01/04/2007 03:35 PM <DIR> . 01/04/2007 03:35 PM <DIR> .. 05/21/2004 02:27 PM A1.pdf 05/03/2005 09:38 AM A2.pdf 05/20/2003 08:46 AM B1.pdf 05/15/2003 12:50 PM B2.pdf
To process all PDF documents in this folder, you could specify:
pdf2text -o c:/output_folder c:/test1/*.pdf
To process all PDF documents starting with 'A', you could specify:
pdf2text -o c:/output_folder c:/test1/A*.pdf
Or to process all PDF documents ending with '1', you could specify:
pdf2text -o c:/output_folder c:/test1/*1.pdf
You can use either of the two standard wildcards --- the question mark (?) and the asterisk (*) --- to specify filename and path arguments on the command line.
The wildcards are expanded in the same manner as operating system commands. (Please refer to your operating system user's guide if you are unfamiliar with wildcards). Enclosing an argument in double quotation marks (" ") suppresses the wildcard expansion. Within quoted arguments, you can represent quotation marks literally by preceding the double-quotation-mark character with a backslash (\). If no matches are found for the wildcard argument, the argument is passed literally.
To provide additional feedback, PDF2Text returns exit codes after completing processing. The exit codes can be used to provide user feedback, for logging etc. This is particularly important for applications running in an unattended environment.
The following table lists possible exit codes and their description:
Exit Code Description --------------- ------------------------------------------------------------------ 0 All files converted successfully. 1 Document is secured. Need a valid password to open the document. 2 Error opening the input file(s). 3 An unknown exception encountered.
All codes other then '0' indicate that there was an error during the conversion process.
The following illustrates a sample Windows batch script that processes exit codes:
@echo off rem convert all PDF files in 'data' folder pdf2text ./data if errorlevel 1 goto passwd if errorlevel 2 goto inputerr if errorlevel 3 goto othererror if errorlevel 0 goto exit :passwd echo Document is protected. Need a valid password to open the document. goto exit :inputerr echo No input files specified. goto exit :othererror echo An error encountered during processing. goto exit :exit
Get the answers you need: Support
Get unlimited trial usage of PDFTron SDK to bring accurate, reliable, and fast document processing capabilities to any application or workflow.
Select a platform to get started with your free trial.
Unlimited usage. No email address required.