public class

TextExtractor.Line

extends Object
java.lang.Object
   ↳ com.pdftron.pdf.TextExtractor.Line

Summary

Public Methods
void destroy()
Frees the native memory of the object.
boolean endsWithHyphen()
check if this line ends with hyphen.
boolean equals(Object other)
Rect getBBox()
Get the bounding box.
int getCurrentNum()
Get the index of the current line of the current page
TextExtractor.Word getFirstWord()
Get the first word in the line

Note: To traverse the list of all words on this line use getNextWord().

int getFlowID()
Get the flow ID.
TextExtractor.Line getNextLine()
Get the next line.
int getNumWords()
Get the number of words.
int getParagraphID()
Get the paragraph ID.
double[] getQuad()
Get the quadrilateral as an array of doubles
TextExtractor.Style getStyle()
Get the style for this line
TextExtractor.Word getWord(int word_idx)
Get the word at specified index
boolean isSimpleLine()
Checks if current line is simple line.
boolean isValid()
Checks if this line is valid.
Protected Methods
void finalize()
[Expand]
Inherited Methods
From class java.lang.Object

Public Methods

public void destroy ()

Frees the native memory of the object. This can be explicity called to control the deallocation of native memory and avoid situations where the garbage collector does not free the object in a timely manner.

public boolean endsWithHyphen ()

check if this line ends with hyphen.

Returns
  • true, if successful

public boolean equals (Object other)

public Rect getBBox ()

Get the bounding box.

Note: To account for the effect of page '/Rotate' attribute, transform all points using getDefaultMatrix().

Returns
  • The bounding box for this line (in unrotated page coordinates).

public int getCurrentNum ()

Get the index of the current line of the current page

Returns
  • the index of this line of the current page.

public TextExtractor.Word getFirstWord ()

Get the first word in the line

Note: To traverse the list of all words on this line use getNextWord().

Returns
  • the first word in the line.

public int getFlowID ()

Get the flow ID.

Returns
  • The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines/paragraphs belong to which flows.

public TextExtractor.Line getNextLine ()

Get the next line.

Returns
  • the next line on the page.

public int getNumWords ()

Get the number of words.

Returns
  • The number of words in this line.

public int getParagraphID ()

Get the paragraph ID.

Returns
  • The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines belong to which paragraphs.

public double[] getQuad ()

Get the quadrilateral as an array of doubles

Returns
  • out_quad The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).

public TextExtractor.Style getStyle ()

Get the style for this line

Returns
  • predominant style for this line.

public TextExtractor.Word getWord (int word_idx)

Get the word at specified index

Parameters
word_idx index of the word
Returns
  • the i-th word in this line.

public boolean isSimpleLine ()

Checks if current line is simple line.

Returns
  • true is this line is not rotated (i.e. if the quadrilaterals returned by getBBox() and getQuad() coincide).

public boolean isValid ()

Checks if this line is valid.

Returns
  • true, if the line is valid

Protected Methods

protected void finalize ()

Throws
Throwable