Show / Hide Table of Contents

Enum TextExtractor.ProcessingFlags

Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.

Namespace: pdftron.PDF
Assembly: PDFNet.dll
Syntax
public enum ProcessingFlags

Fields

Name Description
e_extract_using_zorder

Use Z-order as reading order for text

e_no_dup_remove

Disables removing duplicated text that is frequently used to achieve visual effects of drop shadow and fake bold.

e_no_invisible_text

Enables removing text that uses rendering mode 3 (i.e. invisible text). Invisible text is usually used in 'PDF Searchable Images' (i.e. scanned pages with a corresponding OCR text). As a result, invisible text will be extracted by default.

e_no_ligature_exp

Disables expanding of ligatures using a predefined mapping. Default ligatures are: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE.

e_no_watermarks

Enables removal of text that is marked as part of a Watermark layer

e_none
e_punct_break

Treat punctuation (e.g. full stop, comma, semicolon, etc.) as word break characters.

e_remove_hidden_text

Enables removal of text that is obscured by images or rectangles. Since this option has small performance penalty on performance of text extraction, by default it is not enabled.

Back to top Generated by DocFX