public class

OCROptions

extends OptionsBase
java.lang.Object
   ↳ com.pdftron.pdf.OptionsBase
     ↳ com.pdftron.pdf.OCROptions

Summary

Public Constructors
OCROptions()
Constructor.
OCROptions(String json_string)
Constructor.
Public Methods
OCROptions addDPI(int dpi)
Knowing proper image resolution is important, as it enables the OCR engine to translate pixel heights of characters to their respective font sizes.
OCROptions addIgnoreZonesForPage(RectCollection regions, int page_index)
Adds a collection of ignorable regions for the given page Optional list of page areas that will be not be processed
OCROptions addLang(String lang_code)
Adds a language to the list of to be considered when procecessing this document
OCROptions addTextZonesForPage(RectCollection regions, int page_index)
Adds a collection of known text regions for the given page.
OCROptions setIgnoreExistingText(boolean value)
Sets the value for IgnoreExistingText in the options object Default value is false, so that areas with existing text will be automatically skipped during OCR.
OCROptions setUsePDFPageCoords(boolean value)
Sets the value for UsePDFPageCoords in the options object Sets origin of the coordinate system for input/output
[Expand]
Inherited Methods
From class java.lang.Object

Public Constructors

public OCROptions ()

Constructor.

public OCROptions (String json_string)

Constructor.

Public Methods

public OCROptions addDPI (int dpi)

Knowing proper image resolution is important, as it enables the OCR engine to translate pixel heights of characters to their respective font sizes. We do our best to retrieve resolution information from the input's metadata, however it occasionally can be corrupt or missing. Hence we allow manual override of source's resolution, which supersedes any metadata found (both explicit as in image metadata and implicit as in PDF).

Returns
  • this object, for call chaining

public OCROptions addIgnoreZonesForPage (RectCollection regions, int page_index)

Adds a collection of ignorable regions for the given page Optional list of page areas that will be not be processed

Returns
  • this object, for call chaining

public OCROptions addLang (String lang_code)

Adds a language to the list of to be considered when procecessing this document

Returns
  • this object, for call chaining

public OCROptions addTextZonesForPage (RectCollection regions, int page_index)

Adds a collection of known text regions for the given page. This information will be used as a hint to improve OCR quality.

Returns
  • this object, for call chaining

public OCROptions setIgnoreExistingText (boolean value)

Sets the value for IgnoreExistingText in the options object Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true probably only makes sense when used with GetOCRJson/XML, as pre-existing text might end up being duplicated in the document when used with ImageToPDF and ProcessPDF.

Returns
  • this object, for call chaining

public OCROptions setUsePDFPageCoords (boolean value)

Sets the value for UsePDFPageCoords in the options object Sets origin of the coordinate system for input/output

Returns
  • this object, for call chaining