PDFNet SDK
User Manual
Copyright 2002-2007 by PDFTron Systems,
Inc. All rights reserved. All information contained herein is the
property of PDFTron Systems, Inc. No part of this publication (whether
in hardcopy or electronic form) may be reproduced or transmitted,
in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written consent of the
PDFTron Systems, Inc. The information in this document is furnished
for informational use only, is subject to change without notice,
and should not be construed as a commitment by PDFTron Systems,
Inc. PDFNet SDK is available under license and may only be used
or copied in accordance with the terms of such license.
Contents
- What is PDFNet?
- A short introduction to PDF file format
- Low-level PDF API
- SDF/COS Object Model
- SDF::Obj
- SDF::Doc
- Streams and Filters
- Input Filters/Streams
- Output Filters/Streams
- Implementing custom Filters
- Security
- Securing a document
- Working with secured/encrypted
documents
- Implementing custom security
- High-Level PDF API
- Opening a document
- Serializing (saving) a document
- Working with Pages
- Document's page sequence
- Creating a blank new page
- Copying/Merging Pages
- Removing/Deleting Pages
- Reordering Pages
- Rotating the Page
- Cropping the Page
- Media Box Adjustments
- Shifting Page Content
- Working with Page Content
- What is an Element?
- Graphics State
- Reading Page Content
- Processing Forms, Type3
glyphs, tiling patterns, Processing
changes in Graphics State, Paths
, Text
, Fonts
, Images
, Fonts
, Shadings
, Patterns

- Writing Page Content
- Paths
,
Text ,
Font Embedding ,
Type3 Fonts ,
Images ,
Shading ,
Patterns
- Editing Page Content

- Working with Bookmarks
- Working with Interactive Forms (AcroForms)
- Accessing Interactive Fileds
- Understanding Field Types
- Creating Form Fields
- Filling Form Fields
- FDF Merge and FDF Extract

- Form Flattening
- Color Conversion

- Image Normalization/Conversion

- PDF Security
- Exception and Error Handling
- C/C++ Implementation Notes
- Microsoft.Net Implementation Notes
PDFNet is a high-quality, industry-strength PDF
library meeting requirements of the most demanding and diverse applications.
Using PDFNet you can write stand-alone, cross-platform
and reliable commercial applications that can read, write, and edit
PDF documents.
PDFNet is offered on a wide range of platforms
(e.g. Windows, Mac, Linux, Solaris, etc) and programming
environments (C/C++, C#, VB, J#, and other .Net languages).
PDFNet API namespace is divided into PDF, SDF,
and Filters namespace.

Figure. PDFNet API Modules.
PDF is a set of high-level API's
that can be used to manipulate high-level PDF constructs such as
pages, interactive forms, bookmarks, graphical elements on the page,
etc.
SDF API is a powerful low-level API
that can be used to manipulate every aspect of a PDF document. In
order to use SDF, you need to be familiar with PDF file structure
(documented in the PDF
Reference Manual). Using this powerful API, it is possible
to implement any functionality that is not present in the PDF API.
Filters namespace deals with various
compression and encryption schemes used in PDF. Unless you are planning
to implement a custom encryption or compression scheme on top of
PDF, you only need very basic knowledge of the Filters API.
In this section we present the basic structure of a PDF document.
For details please refer to the PDF
Reference Manual. Below is a listing of a very simple PDF document
that displays "Hello World" string on a single page.
0000 %PDF-1.4
0001 1 0 obj <<
0002 /Parent 5 0 R
0003 /Resources 3 0 R
0004 /Contents 2 0 R
0005 >>
0006 endobj
0007 2 0 obj
0008 <<
0009 /Length 51
0010 >>
0011 stream
0012 BT
0013 /F1 24 Tf
0014 1 0 0 1 260 330 Tm
0015 (Hello World)Tj
0016 ET
0017 endstream
0018 endobj
0019 3 0 obj
0020 <<
0021 /ProcSet [/PDF/Text]
0022 /Font <</F1 4 0 R >>
0023 >>
0024 endobj
0025 4 0 obj <<
0026 /Type /Font
0027 /Subtype /Type1
0028 /Name /F1
0029 /BaseFont/Helvetica
0030 >>
0031 endobj
0032 5 0 obj
0033 <<
0034 /Type /Pages
0035 /Kids [ 1 0 R ]
0036 /Count 1
0037 /MediaBox [0 0 612 714]
0038 >>
0039 endobj
0040 6 0 obj
0041 <<
0042 /Type /Catalog
0043 /Pages 5 0 R
0044 >>
0045 endobj
0046 xref
0047 0 7
0048 0000000000 65535 f
0049 0000000009 00000 n
0050 0000000103 00000 n
0051 0000000204 00000 n
0052 0000000275 00000 n
0053 0000000361 00000 n
0054 0000000452 00000 n
0055 trailer
0056 <<
0057 /Size 7
0058 /Root 6 0 R
0059 >>
0060 startxref
0061 532
A PDF file consists of four sections:
- A one-line header identifying the version of the PDF specification
to which the file conforms (Line 0). In the above sample the header
string is "%PDF-1.4" which identifies this file as a
PDF file that adheres to the 1.4 specification.
- A body containing the objects that make up the document contained
in the file (Lines 1-45). Our sample file shows 6 objects each
beginning with "obj" and ending with "endobj".
Each object has its own number and a zero. The zero is the revision
level (also known as the generation number) because PDF allows
updates to the file to be made without re-writing the whole file.
- A cross-reference table containing information about the indirect
objects in the file (Lines 46-54). The cross reference in our
sample notes that it contains 7 entries; a dummy for object zero
and one for each of the 6 objects. The table maps implicit object
index into a byte offset from the beginning of the file to the
location where the object is located. For example, Object 1 is
represented first indicating that it begins at byte 9; Object
3 is represented with the fourth entry indicating that it is located
at byte 204 in the file. etc.
- A trailer giving the location of the cross-reference table and
of certain special objects within the body of the file (Lines
55-61).
Note that the objects refer to each other using a notation like
"5 0 R". The "R" stands for reference and it
uses the two preceding numbers to know which object and revision
we wish to reference.
Therefore, the file body consists of a collection of objects that
refer to each other forming an object graph. We could represent
the "Hello World" sample file using the following abstract
graph representation.

Figure. Object Graph.
Each object in the graph is represented with an ellipse and the
object cross references as arrows.
All PDF files must have a "Root" node. It must reference
a "Catalog" node which must reference a "Pages"
node. The "Pages" node further branches and points to all the pages
in the document. Note that a "Pages" node points to a
group of pages whereas the "Page" node represents a single
page.
The "Page" node references the page "Contents"
and the page "Resources". The resource dictionary in turn
references "Fonts" used on the page. The resource dictionary
can reference many other resource types such as Color Spaces, Patterns,
Shadings, Images, Forms, etc. The page contents stream contains markup
operators used to draw the page.
All PDF files obey this basic object structure to represent a PDF
document.
Before going into details of PDFNet SDF/COS object model, we should
review the basics. For a detailed description of the SDF syntax and
semantics, please refer to Chapter 3 (Syntax) of the PDF
Reference Manual.
In PDF there are five atomic objects:
| Object Type |
Description |
Samples |
| Number |
PDF provides two types of numeric object: integer and real.
|
1.03 612 |
| Bool |
Boolean objects are identified by the keywords true and false. |
true false |
| Name |
A name object is an atomic symbol uniquely defined by a sequence
of characters. Names always begin with "/" and can
contain letters and numbers and a few special characters. |
/Font /Info /PDFNet |
| String |
Strings of bytes are in PDF enclosed in "(" and
")" |
(Hello World!) |
| Null |
The null object has a type and value that are unequal to those
of any other object. Usually refers to a missing object. |
null |
Also, there are two compound objects:
| Object Type |
Description |
Samples |
| Array |
An array object is a one-dimensional collection of objects
arranged sequentially. Unlike arrays in many other computer
languages, PDF arrays may be heterogeneous; that is, an array's
elements may be any combination of numbers, strings, dictionaries,
or any other objects, including other arrays. |
[]
[ true /Name ]
[ (Hello) [1] false 54.3 /Font ] |
| Dictionary |
A dictionary object is a map containing pairs of objects,
known as the dictionary's entries. The first element of each
entry is the key and the second element is the value. The key
must be a name. The value can be any kind of object, including
another dictionary. |
<</key /value >>
<< /first (Str Value) /second [true false] /third <<
/yes /no >> >> |
| Stream |
A stream is essentially a dictionary followed by a binary
stream. PDF streams are always indirect
objects so they can be shared. |
1 0 obj << /Length 144 >>
stream ........... endstream endobj |
Objects can be arbitrarily nested using the dictionary and array
compounding operations.
All of the objects in the above tables are "direct objects"
because they are not surrounded by "obj" and "endobj"
keywords. The body of the PDF document is actually made up of a sequence
of "indirect objects". An indirect object
is created by taking a single direct object (atomic or compound)
and enclosing it with the "1 0 obj" and "endobj"
keywords.
Note that, since direct objects are not numbered, they can't
be shared. However, because indirect objects are numbered and can
be referenced by other objects, they can be shared (i.e. referenced
by more than one other object).
In the above PDF example, the object '3 0 obj' is an indirect object
because "obj" and "endobj" keywords wrap a dictionary
object containing two entries.
3 0 obj
<<
/ProcSet [/PDF /Text]
/Font << /F1 4 0 R >>
>>
endobj
"ProcSet" key is mapped to an array which is a direct
object containing atomic direct objects. In a similar way, the "Font"
key is mapped to a direct dictionary. On the other hand, "F1"
in the inner dictionary is mapped to an indirect object with the
object number 4 and the generation number 0. Because the Font object
is indirect, the same font resource can be shared across many different
pages.
Real life PDF documents are much more complex than the "Hello
World" sample from the previous section. Streams in a PDF document
can be compressed and encrypted, objects can form complex networks,
and in PDF 1.5 parts of the object graph can be compressed and embedded
in so called 'object streams'. All this makes manual editing of
PDF files extremely difficult or impossible. The good news is that
PDFTron Systems released a utility software called CosEdit
that can be used to browse, and edit PDF at object level with unprecedented
ease and control. PDFNet also provides a full SDF/COS level API
making it very easy to read, write, and edit PDF and FDF at the
atomic level. Furthermore, PDFNet provides a high-level API that
can be used to read, write, and edit PDF documents in terms of pages,
bookmarks, graphical primitives, etc.
SDF (Structured Document Format) and COS
(Carousel Object System; Carousel was a codename for Acrobat 1.0)
are synonyms for PDF low-level object model. SDF is the acronym
used in PDFNet, whereas COS is a legacy word used in Acrobat SDK.
In many ways, SDF is to PDF what XML
and DOM is to SVG
(Scalable Vector Graphics). SDF/COS object system provides the low-level
object type and file structure used in PDF files. PDF documents
are graphs of SDF objects. SDF objects can represent document components
such as bookmarks, pages, fonts, and annotations, etc.
PDF is not the only document format built on top of SDF/COS. FDF
(Form Data Format) and PJTF (Portable Job Ticket Format) are also
built on top of SDF/COS.
The SDF layer deals directly with the data that is in a PDF document.
The data types are referred to as SDF objects. There are eight data
types found in PDF files. They are arrays, dictionaries, numbers,
boolean values, names, strings, streams, and a null object. PDFNet
implements these objects as shown in the following graph:

Figure. SDF Obj Hierarchy.
Obj is the base class for all SDF objects. Obj hierarchy implements
a composite pattern so you can invoke a member function of any derived
object through the base class interface (i.e. Obj implements methods
for all derived classes). This is illustrated in the following C#
sample code.
Doc doc = new Doc("in.pdf");
// Get the trailer
Obj trailer = doc.GetTrailer();
// Get the info dictionary.
Obj info = trailer.Get("Info").Value();
// Replace the Producer entry
info.Put("Producer", Obj.CreateString("PDFNet"));
// Create a custom inline dictionary within
// Info dictionary
Obj custom_dict = Obj.CreateDict();
info.Put("My Direct Dict", custom_dict);
// Add some key/value pairs
custom_dict.Put("My Number", Obj.CreateNumber(100));
Obj my_array = Obj.CreateArray();
custom_dict.Put("My Array", my_array);
// Create a custom indirect array within Info dictionary
Obj custom_array = doc.CreateIndirectArray();
info.Put("My Indirect Array", custom_array);
// Create indirect link to root
custom_array.PushBack(trailer.Get("Root").Value());
// Embed a custom stream (file my_stream.txt).
StdFile embed_file = new StdFile("myfile.txt",
PDFNet.StdFile.OpenMode.e_read_mode);
FilterReader mystm = new FilterReader(embed_file);
custom_array.PushBack(doc.CreateIndirectStream(mystm));
doc.Save("out.pdf", 0, "%PDF-1.4"); // Save PDF
If a member function is not
supported on a given object (e.g. if you are invoking obj.GetName()
on a Bool object), an Exception will be thrown. Learn more about
PDFNet exception handling under the Error
handling section.
In order to find out type-information at run-time, use obj.GetType()
or obj.Is???() methods (where ??? represent the Type in question;
e.g. Array, Number, Bool, Str, Dict, Stream). Most of the time the
object type can be inferred from PDF/FDF specification. For example,
when you call doc.GetTrailer(), you can assume that the returned
object is a dictionary object because this is mandated by PDF specification.
If the object is not a dictionary, an exception will be thrown when
a dictionary method is called on the object. This way the code is
both efficient and elegant since unnecessary type casts and type
checks are not required. In case there is an ambiguity in PDF/FDF
specification, you can use GetType() or Is???() methods.
As mentioned in the previous section, SDF objects can be either direct or indirect. Direct
objects can be created using Obj.Create???() methods. The
following example illustrates how to create a direct number/name
object inside Dict/Array object.
// Create a direct number/name/dict
Obj direct_num = Obj.CreateNumber(100);
Obj direct_name = Obj.CreateName("My Name");
Obj direct_dict = Obj.CreateDict();
// you can insert newly created direct objects
// into other container objects.
direct_dict.Put("My Number", direct_num);
doc.GetRoot().Put(My Dict, direct_dict);
doc.GetRoot().Put(My Name, direct_name);
New indirect objects can be created
using doc.CreateIndirect???() methods on a SDF document. The following
code shows how to create a new Number and new Dictionary indirect
object:
Obj mynumber = doc.CreateIndirectNumber(100);
Obj mydict = doc.CreateIndirectDict(); PDFNet SDF provides many utility methods that can be
used to efficiently traverse SDF object graph. Here is an example
on how to get to document's page root:
Obj pages = doc.GetTrailer()
.Get("Root").Value()
.Get("Pages").Value();
Note that because PDF specification
mandates that "Root" is always a dictionary, we can directly
reference the "Pages" object using a Get("key")
. If "Root" was not a dictionary object, an exception would
be thrown.
In order to retrieve an object that may or may not be present in
a dictionary, use dict.Find("key") method. For example,
DictIterator itr = dict.Find("My Key");
Obj my_value = null;
if (itr != dict.DictEnd())
{
my_value = itr.Current().Value();
// ...
}
Note that dict.Find("key") returns a DictIterator
object. If the given key is not present in the dictionary, DictIterator
would be equal to dict.DictEnd(), otherwise DictIterator refers
to the key-value pair that was found.
You can use DictIterator in order to traverse key-value pairs within
a dictionary:
DictIterator itr = dict.DictBegin();
DictIterator end = dict.DictEnd();
while (itr!=end)
{
// itr.Current().Key();
// itr.Current().Value();
itr.Next();
}
In order to retrieve objects from an Array object, use array.GetAt(idx)
method:
// C++ sample
Obj* obj;
for (int i=0; i<array->Size(); ++i)
{
obj = array->GetAt(i);
// ...
} Obj hierarchy also implements a visitor
pattern, so you can derive objects from the ObjVisitor class. A visitor object can
traverse the Obj graph structure and perform certain operations on graph
nodes based on their type. (read more about this later).
Using PDFNet API, it is easy to create new links to direct or indirect
objects. In the previous section we mentioned that a new direct
object can be created using Obj.Create???() methods, whereas a new
indirect object can be created using Doc.CreateIndirect???() methods.
The remaining question is how to create links/references linking
indirect objects? In PDFNet, creating new indirect references is
very simple and transparent:
// C# sample
// Create indirect dictionary containing (/Key, /Foo)
// entry that will be shared.
Obj shared_dict = doc.CreateIndirectDict();
shared_dict.Put("Key", Obj.CreateName("Foo"));
// shared_dict.IsIndirect() returns true ...
// Get document's info dictionary.
Obj trailer = doc.GetTrailer();
Obj info = trailer.Get("Info").Value();
// Add indirect reference to 'shared_dict'.
info.Put("MyDict", shared_dict);
// Get document's root dictionary.
Obj root = trailer.Get("Root").Value();
// Add a second indirect reference to 'shared_dict'.
root.Put("MyDict", shared_dict);
Note that indirect links are created in
exactly the same way as direct links, (i.e. using dict.Put("key",
obj), array.Insert(idx, obj), and array.PushBack/PushFront(obj)
methods).
Multiple objects can refer to the same object, however the shared
object must be indirect (i.e. it was
created using Doc.CreateIndirect???() or obj.IsIndirect() returns
true)
Because in PDF creating multiple links to direct objects is not
allowed, PDFNet will throw an exception when you attempt to create
multiple links/references to the same direct object. This is shown
below:
// C# sample
try
{
// Create a direct Boolean object
Obj direct_obj = Obj.CreateBool(true);
Obj trailer = doc.GetTrailer();
Obj info = trailer.Get("Info").Value();
// Insert the direct object into info dictionary
info.Put("Link1", direct_obj);
Obj root = trailer.Get("Root").Value();
// Attempt to create a second link to direct_obj.
// This will throw an exception. If you want to
// share objects create them using
// Doc::CreateIndirect???() methods
root.Put("Link2", direct_obj);
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message);
}
In addition to basic objects mentioned so far,
PDF also supports stream objects. A stream object is essentially
a dictionary with an attached binary stream. In PDFNet, all methods
that apply to dictionaries apply to streams as well.
In addition to methods provided by Dict, streams provide an interface
used to access an associated data-stream. You can use stm.GetDecodedStream()
to get decoded data or stm.GetRawStream() to get the raw data without
any Decode filters applied. GetRawStreamLength() returns the length
of the raw data-stream. This number is the same as the one stored
under “Length” key in the stream dictionary.
PDFNet supports all compressions and encryption schemes used in
PDF and the access to decoded data is transparent. The following
code decodes and extracts the contents of a given stream to an external
file:
// C# sample
Obj stream = ...
Filter dec_stm = stream.GetDecodedStream();
FilterReader reader = new FilterReader(dec_stm);
// Write decoded data to the output file.
// First open the file
StdFile out_file = new StdFile("out.bin",
PDFNet.StdFile.OpenMode.e_read_mode);
FilterWriter writer = new FilterWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
For a more complete discussion on PDFNet Filters
see PDFNet Streams and Filters.
The overview of SDF object model in not complete without mentioning
SDF Doc. SDF Doc brings together document security, document utility
methods, and all SDF objects.
A SDF document can be created from scratch using a default constructor:
Doc mydoc = new Doc();
Obj trailer = mydoc.GetTrailer();
SDF document can be also created from an existing
file (e.g. an external PDF document):
Doc mydoc = new Doc("in.pdf");
Obj trailer = mydoc.GetTrailer();
or from a memory buffer or some other Filter/Stream
such as a HTTP Filter connection:
MemoryFilter memory = ....
Doc mydoc = new Doc(memory);
Obj trailer = mydoc.GetTrailer();
Finally SDF document can be accessed from a high-level
PDF document as follows:
PDFDoc doc = new PDFDoc("in.pdf");
Doc mydoc = doc.GetSDFDoc();
Obj trailer = mydoc.GetTrailer();
Note that the examples above used doc.GetTrailer()
in order to access document trailer, the starting SDF object (root
node) in every document. Following the trailer links, we can visit
all low-level objects in a document (e.g. all pages, outlines, fonts,
etc).
SDF Doc also provides utility methods used to import objects and
object collections from one document to another. These methods can
be useful for copy operations between documents such as a high-level
page merge and document assembly.
One of the basic building blocks of a PDF document is a SDF stream
object. For example, in a PDF document all page content, images,
embedded fonts, and files are represented using object streams that
can be compressed and encrypted using various Filter chains. See
"Stream Objects" and "Filters" chapters in thePDF
Reference Manual for more details.
PDFNet supports an efficient and flexible architecture for processing
stream using Filter pipelines.
A Filter is an abstraction of a sequence of bytes, such as a file,
an input/output device, an inter-process communication pipe, or
a TCP/IP socket. A filter can also perform certain transformations
of input/output data (e.g. data compression/decompression, color
conversion, etc.)
PDFNet provides a generic input/output filter for external files
using StdFile class. Use the StdFile class to read from, write to,
open, and close files on a file system. For example,
StdFile myfile = new StdFile("in.jpg",
StdFile.OpenMode.e_read_mode);
opens an external image file for reading. StdFile
buffers input and output for better performance. Although it is
possible to read input data directly through the Filter interface (StdFile
is a Filter), it is more convenient to attach a FilterReader to the
filter and then read data through FilterReader interface:
// C#
FilterReader reader = new FilterReader(myfile);
int bytes;
while((bytes = reader.Read(buffer)) != 0)
{
}
Data associated with SDF stream objects can be accessed
using Stream.GetRawStream() or Stream.GetDecodedStream() methods.
// C# sample
void Extract(Obj stream)
{
Filter dec_stm = stream.GetDecodedStream();
FilterReader reader = new FilterReader(dec_stm);
int bytes;
while((bytes = reader.Read(buffer)) != 0)
{
}
} Stream.GetRawStream() creates a Filter used to extract
raw data as it appears in serialized document (or a decrypted version
of the stream if the document is secured). Stream.GetDecodedStream()
creates a Filter pipeline and returns the last filter in the chain.
For example, a given stream may be compressed using JPEG (DCTDecode)
compression and encoded using ASCII85 into a ASCII stream. When
GetDecodedStream() is invoked on this SDF stream, it will return
the last filter in the chain that contains three filters (file segment
input filter, DCTDecode, and ASCII85Decode Filter respectively).
Data extracted from the returned Filter will be raw image data (i.e.
RGB triplets).
It is possible to iterate through the Filter chain using Filter.GetAttachedFilter()
method. For example, the following code snippet prints out all the
Filter names in the filter chain.
// C# sample
Filter attached_flt;
Filter cur_flt = dec_stm;
while (attached_flt = cur_flt.GetAttachedFilter())
{
Console.WriteLine(cur_flt.GetName());
cur_flt = attached_flt;
}
It also possible to construct new and edit existing
filter chains using Filter.AttachFilter(flt) method.
In order to open an external file Filter for writing using PDFNet
use StdFile class is follows:
StdFile myfile = new StdFile("out.txt",
StdFile.OpenMode.e_write_mode);
After the output file filter/stream is opened you
can output data using FilterWriter class:
FilterWriter writer = new FilterWriter(myfile);
writer.WriteString("Hello World");
writer.Flush();
Output filters can also be chained in order to compress
and transform data (e.g. data encoding, color-conversion, image
resampling, etc). The following sample creates an output filter
chain that compresses data using Flate compression method and than
encodes compressed data using ASCII85. The last filter in the chain
is the output file that will contain the resulting data:
Filter myfile = new StdFile("out.bin",
StdFile.OpenMode.e_write_mode);
// attach the output of ascii85 to myfile
Filter ascii85 = new ASCII85Encode(myfile);
// attach the output of flate to ascii85
Filter flate = new ASCII85Encode(ascii85);
FilterWriter writer = new FilterWriter(flate);
writer.WriteString("Hello World");
writer.Flush();
PDFNet provides full support for all common Filters used in PDF.
Although included Filters should cover all common use case scenarios,
advanced users may want to provide custom implementations for certain
filters (e.g. custom color conversion, or a new compression method).
PDFNet provides an open and expandable architecture for creation
of custom filters. To implement a custom Filter, derive a new class
from Filter base class and implement the required interface. A more
detailed guide for implementing custom Filters is available through
PDFTron Systems developer program. Please contact support @pdftron.com
for more details.
PDF documents can be secured and encrypted using various encryption
schemes. PDFNet provides support for standard security handler and
provides an extension mechanism through which users can register
custom security handlers.
The code that performs user authorization and sets permissions
is known as a security handler. The core API has one built-in security
handler known as Standard Security Handler (StdSecurityHandler).
The Standard Security Handler supports two passwords:
- A user password that allows a user to open and read a protected
document with whatever permissions the owner chose
- An owner password that allows a document’s owner to also
change the permissions granted to users.
Applications can also implement their own implementations of SecurityHandler.
For example, a custom implementation of a SecurityHandler may perform
user authorization that requires the
presence of a hardware dongle, or a hardware key, file, etc.
A Security Handler is used when:
- A document is opened. The security handler must determine whether
a user is authorized to open the file and set up RC4 decryption
key that is used to decrypt the file.
- A document is saved. The security handler must set up RC4 encryption
key and write the required security information into the PDF file’s
encryption dictionary.
- A user tries to change a document’s security settings.
Note that Standard Security Handler in PDFNet does not enforce
current permissions. For example it is possible to edit a document
although document modification permission is not granted. Therefore
it is up to the application to respect PDF permissions.
A document may have zero, one, or two security handlers associated
with it. A document has zero security handlers if the file is not
secured. When security is applied to a file, or the user selects
a different security handler for a secured file, the newly-chosen
security handler is not put in place immediately. Instead this new
security handler is a pending security handler until the document
is saved.
A document may have both a current and a new security handler associated
with it because PDF document is not fully loaded in memory and decrypted
when it is loaded so the original security handler is still required
to decrypt the content.
To secure a document, create a new SecurityHandler, set permissions,
and authentication data, and set it is as new handler using doc.SetSecurityHandler(handler).
For example,
// C# Sample
StdSecurityHandler new_handler = new StdSecurityHandler();
// Set a user password required to open a document
byte[] user_password = new byte [4];
user_password[0] = (byte)'t';
user_password[1] = (byte)'e';
user_password[2] = (byte)'s';
user_password[3] = (byte)'t';
new_handler.ChangeUserPassword(user_password);
// Set Permissions
new_handler.SetPermission(
SecurityHandler.Permission.e_print, true);
new_handler.SetPermission(
SecurityHandler.Permission.e_extract_content, false);
// Associate the new_handler with the document.
Doc doc = new Doc("in.pdf");
doc.SetSecurityHandler(new_handler);
PDFNet fully supports reading of encrypted PDF documents. You can
check whether a document is encrypted using doc.IsEncrypted() method.
If document is encrypted you should initialize security handler
using doc.InitializeSecurityHandler() method.
// Open a potentially encrypted document
Doc doc = new Doc("in.pdf");
doc.InitializeSecurityHandler()
Because InitializeSecurityHandler() doesn't have any side effects
on documents that are not encrypted you can always invoke this method
after constructing a document.
If a document doesn't require a authentication data (e.g. a user
password) in order to view the content InitializeSecurityHandler()
is enough to work with encrypted documents. On the other hand if
a document requires a user password or other authorization data
in order to open a document and view the content you need to implement
the user interface methods that will perform authentication (e.g.
a method that will collect user password through a dialog box).
The default security handler does not collect authorization data
and will throw an exception if a document requires a user password.
In order to define a custom UI that implements application specific
authorization procedure derive a class from StdSecurityHandler and
implement UI callback methods as in the following example:
// C++ sample
class MySecurityHandler : public StdSecurityHandler
{
public:
MySecurityHandler (int key_len, int enc_code)
: StdSecurityHandler(key_len, enc_code) {}
MySecurityHandler (const MySecurityHandler& s)
: StdSecurityHandler(s) {}
// In this callback ask authorization data.
// This may involve a GUI dialog used to collect
// the password.
bool GetAuthorizationData (Permission p)
{
string pass;
// collect the password from standard input
cin >> pass;
InitPassword(pass);
return true;
}
// This callback could be used to customize
// security handler preferences.
bool EditSecurityData(SDF::Doc& doc) {
return false;
}
// This callback is invoked when authorization
// process fails.
void AuthorizeFailed()
{
// Display the error message.
cout << "Authorize failed...." << endl;
}
SecurityHandler* Clone() const {
return new MySecurityHandler(*this);
}
static SecurityHandler* Create (const char* name,
int key_len, int enc_code) {
return new MySecurityHandler (key_len, enc_code);
}
};
In this sample GetAuthorizationData() callback
was used to collect the user password from the standard input followed
by a call to InitPassword(). AuthorizeFailed() callback is called
if the password supplied in InitPassword() is invalid.
In order for MySecurityHandler to take effect when opening secured
documents you need to register the new handler with SecurityManager
using RegisterSecurityHandler(...) static method:
SecurityManagerSingleton::Instance()
.RegisterSecurityHandler("Standard",
SDF::SecurityDescriptor("Standard Security",
MySecurityHandler::Create));
Security handler registration is usually done once
upon program startup. Security handlers can also be registered and
removed dynamically at any point during program lifetime. A more
complete example used to register and initialize a security handler
is given blow:
SecurityManager& sec_mgr =
SecurityManagerSingleton::Instance();
sec_mgr.RegisterSecurityHandler("Standard",
SDF::SecurityDescriptor("Standard Security",
MySecurityHandler::Create));
// Open a secured document
Doc doc("file_in.pdf");
if (!doc.InitializeSecurityHandler())
{
cout << "Document authentication error....";
return;
}
The first step was to get a reference to
the SecurityManager. SecurityManager is a global object (singleton)
that keeps track of all registered SecurityHandlers. By default
only StdSecurityHandler with no UI interaction is registered. The
second step was to register a standard security handler called MySecurityHandler
that provides UI authorization functions. The first argument to
RegisterSecurityHandler() was the name of the security handler as
it appears in document Encrypt dictionary ("Standard")
and the second parameter is SecurityDescriptor. SecurityDescriptor
accepts handler’s descriptive name that may be used in a UI
interface ("Standard Security"), and a pointer to a factory
method that will be used to instantiate the security handler when
it is required. MySecurityHandler's callback functions will be invoked
during a call to doc.InitializeSecurityHandler(). InitializeSecurityHandler
will attempt to collect the authorization data by calling GetAuthorizationData()
on MySecurityHandler. If the correct authorization information is
not obtained after several attempts InitializeSecurityHandler()
will call MySecurityHandler's AuthorizeFailed callback.
After SecurityHandler is initialized you can access the security
handler associated with the document using GetSecurityHandler()
method. You can edit permissions, and authorization data on existing
handler or set a completely new security handler using doc.SetSecurityHandler(handler)
method.
To remove PDF security set the current SecurityHandler
to null:
// C#
PDFDoc doc = new PDFDoc("encrypted.pdf");
doc.InitializeSecurityHandler();
doc.SetSecurityHandler(null);
// C++
PDFDoc doc("encrypted.pdf");
doc.InitializeSecurityHandler();
doc.SetSecurityHandler(AutoPtr<SecurityHandler>(0));
Besides providing a full support for standard PDF security, PDFNet
allows users to work with custom security handlers and proprietary
encryption algorithms. To define a custom security handler derive
a class from SecurityHandler and implement SecurityHandler's interface.
The registration and use of custom security handler is identical
to the procedure outlined for Standard Security handler in the previous
section. Please contact support @pdftron.com for more details.
High-level PDF constructs such as pages, interactive forms, bookmarks,
graphical elements on the page are implemented in namespace called
PDF. PDF classes contain methods that can be used to copy pages
between documents, to read/write graphical Elements such as images,
paths, and text, to manipulate interactive forms etc. Although PDF
implements most commonly used PDF functionality you can at any point
access underlying SDF objects and have full control of the low-level
object model.
A PDF document can be created from scratch using a default constructor:
PDFDoc new_doc = new PDFDoc(); The new document does not contain any pages. See Working
with Pages section for details on how to create new and how
to work with existing pages.
Using PDFNet you can open a document from a serialized file, from
a memory buffer, and from a Filter stream.
To open an existing PDF file specify the file-path in PDFDoc constructor:
PDFDoc mydoc = new PDFDoc("in.pdf"); You can also open an existing PDF document
from a memory buffer:
FileStream stm = new FileStream("in.pdf",
FileMode.Open, FileAccess.Read);
BinaryReader reader = new BinaryReader(stm);
byte[] buffer = reader.ReadBytes(
(int) reader.BaseStream.Length);
reader.Close();
PDFDoc mydoc = new PDFDoc(buffer);
You can also provide a MemoryFilter
or a custom Filter such as HTTPFilter in order to provide alternative
ways to access existing PDF data.
If the existing document is encrypted (i.e. doc.IsEncrypted())
returns true you need to call doc.InitSecurityHandler() after constructing
the document. In practice you may always call doc.InitSecurityHandler()
since the method does not have any side effect on documents that
are not secured.
PDFDoc doc = new PDFDoc("in.pdf");
if (!doc.InitializeSecurityHandler())
{
Console.WriteLine("Document authentication error...");
return;
}
PDFNet security API is explained
in details in Security and PDF
Security sections.
PDF document can be serialized (or saved) to a file on a disk,
to a memory buffer, or to an arbitrary data stream such as MemoryFilter
or HTTPFilter.
To save a file on a disk use PDFDoc::Save(...) method, i.e.
doc.Save("out.pdf", 0);
The second argument represents a bitwise set flags that are used
as options during serialization.
PDFNet allows document to be saved incrementally (see section 2.2.7
"Incremental Update" in PDF
Reference Manual). Because applications may allow users to modify
PDF documents users should not have to wait for the entire file
(which can contain hundreds of pages) to be rewritten each time
modifications to the document are saved. PDFNet allows modifications
to be appended to a file, leaving the original data intact. The
addendum appended when a file is incrementally updated contains
only those objects that were actually added or modified. Incremental
update allows an application to save modifications to a PDF document
in an amount of time proportional to the size of the modification
rather than the size of the file. In addition, because the original
contents of the document are still present in the file, it is possible
to undo saved changes by deleting one or more file updates.
Changes can be appended to an existing document using e_incremental
flag:
doc.Save("in.pdf", PDFDoc.e_incremental);
Note that the file output name matches the input name.
Some PDF files over time accumulate objects that are not used (e.g.
old updates, modifications, unused fonts, images, etc). To trim
down the file size use e_remove_unused flag:
doc.Save("out.pdf", PDFDoc.e_remove_unused);
In order to provide user feedback PDFDoc::Save(...) method accepts
optional object derived from ProgressMonitor base-class. ProgressMonitor
provides a callback interface that keeps the client application
up to date about the function progress.
A PDF document can also be serialized in a memory buffer as follows:
byte[] buf = null;
int buf_sz = 0;
doc.Save(ref buf, ref buf_sz, 0);
A high-level PDF document (PDF::PDFDoc) contains a sequence of
PDF::Pages as illustrated in the following figure:

Figure. PDFDoc Page sequence.
PDF::PDFDoc::PageBegin() returns a PageIterator to the first Page
in the document, whereas PDF::PDFDoc::PageEnd() returns a PageIterator
to null or non-existent page. If doc.PageBegin() iterator equals
doc.PageEnd() the document has no pages. You can also determine
the number of pages in the document using PDFDoc::GetPageNumber()
method. The following code snippet shows how to print out the media
box coordinates (i.e. page size) for every page in document page
sequence:
PageIterator i=doc.PageBegin();
PageIterator end=doc.PageEnd();
for (; i!=end; i.Next())
{
Rect mediabox = new Rect(itr.Current().GetMediaBox());
Console.WriteLine("Media box: {0}, {1}, {2}, {3}",
mediabox.x1, mediabox.y1,
mediabox.x2, mediabox.y2);
}
In this code we used itr.Next() in order to move
to the next page in the sequence (in a similar fashion you can use
itr.Prev() in order to move to the previous page) and itr.Current()
in order to access the Page object referenced by the iterator.
Another way to achieve the same result as in previous code sample
is using GetPageNumber() and PageFind(page_num) methods:
int page_num = doc.GetPageNumber();
for (int i=1; i<=page_num; ++i)
{
PageIterator itr = doc.PageFind(i);
Page page = itr.Current();
Rect mediabox = new Rect(page.GetMediaBox());
Console.WriteLine("Media box: {0}, {1}, {2}, {3}",
mediabox.x1, mediabox.y1,
mediabox.x2, mediabox.y2);
}
Note that because pages in the document sequence
are indexed starting from 1 , another way to access the first page
in the document is using doc.Find(1). If the given page number can
not be found in the document's page sequence Find(page_num) returns
a PageIterator to null or non-existent page. Therefore:
PageIterator itr = doc.PageFind(page_num);
if (itr!=doc.PageEnd())
{
Console.WriteLine("PageFind returned an iterator
to an existing page");
}
else
{
Console.WriteLine(
"Document does not contain page#: {0}", page_num);
}
In order to create a new page use PDFDoc::PageCreate(media_box)
method. The function has an optional Rect argument that can be used
to specify page size or more specifically its "media box".
A media box is a rectangle, expressed in default user space units,
defining the boundaries of the physical medium on which the page
is intended to be displayed or printed. A user space units is 1/72
of an inch. If media_box is not specified the default dimensions
of the page are 8.5 x 11 inches (or 8.5*72, 11*72 units).
Page x = doc.PageCreate();
doc.PagePushBack(x);
In this code snippet we created a new 8.5x11 page and
have added it at the end of document's page sequence.
Note that after the page is created it does not belong to document's
page sequence and needs to be placed at a specific location within
the sequence in order to be 'visible'. This is illustrated in the
above figure where page 'x' is
shown to be outside of document's page sequence. PagePushBack()
moves page 'x' at a specific location in the page sequence.
A Page can be copied from one document to another (or replicated
within an existing document) using PDFDoc.PageInsert(where, pg),
PDFDoc.PagePushFront(pg), PDFDoc.PagePushBack(pg) and PDFDoc.ImportPages(list)
methods.
PagePushBack(page) appends the given page at the end of page sequence,
whereas PagePushFront(page) inserts the page at the front of the
sequence. PageInsert(where, page) inserts the page in front the
page pointed by 'where' iterator.
// Append three copies of the page to the document.
doc.PagePushBack(x);
doc.PagePushBack(x);
doc.PagePushFront(x);
// Create a new page and insert it just before
// the second page
doc.PageInsert(doc.PageFind(2), doc.PageCreate());
Note that it is possible to replicate a given page
within a document by repeatedly adding the same page.
The same methods can also be used to merge documents or copy pages
from one document to another.
In PDF every page object references various resource objects such
as images, fonts, and color spaces that are used to render the page.
In order to accurately copy a page from one document to another
PageInsert / PagePushFront / PagePushBack methods copy all referenced
resources.
If you are copying several pages between two documents it is better
to use PDFDoc.ImportPages(page_list) because the resulting document
will be much smaller and the copy operation will be faster.
ImportPages() is better than other methods for multi page copy
because it preserves resource sharing in the target document. This
is illustrated in following figures.

Figure. Copying pages between two documents using PageInsert/PagePushFront/PagePushBack
In PDF document page resources (e.g. fonts, images, color-spaces,
forms, etc) can be shared across several pages in order to reduce
file-size and speed up page processing. This is shown in 'Document
1' in the above figure where all three pages share the same font
and color space object. 'Document 2' was created by direct page
copy using PageInsert, PagePushFront or PagePushBack methods. Note
that every page now refers to a separate instance of resource object.
On the other hand the result of page copy using ImportPages() is
identical to the original document. Note that in 'Document 2' resource
objects are shared across pages.

Figure. Copying pages between two documents
using ImportPages()
Also note that if pages are copied/replicated within the same document
(not between two different documents) all methods behave the same
and resources are always shared.
A code that copies all pages from one document to another may look
like this:
PDFDoc in_doc = new PDFDoc("in.pdf");
PDFDoc new_doc = new PDFDoc();
PageIterator i = in_doc.PageBegin();
PageIterator end = in_doc.PageEnd();
for (; i!=end; i.Next())
{
new_doc.PagePushBack(i.Current());
}
but as we explained above it is better to keep sharing
resources in 'new_doc' by importing all pages first (using ImportPages()
method):
PDFDoc in_doc = new PDFDoc("in.pdf");
PDFDoc new_doc = new PDFDoc();
// Create a list of pages to copy.
ArrayList copy_pages = new ArrayList();
PageIterator itr = in_doc.PageBegin();
PageIterator end = in_doc.PageEnd();
for (; itr!=end; itr.Next())
{
copy_pages.Add(itr.Current());
}
// Import all the pages in 'copy_pages' list
ArrayList imported_pages = new_doc.ImportPages(copy_pages);
// Note that pages in 'imported_pages' list are not
// placed in documen't page sequence. This is done
// in the following step.
for (int i=0; i!=imported_pages.Count; ++i)
{
new_doc.PagePushBack((Page)imported_pages[i]);
} ImportPages(page_list) creates a copy of pages given
in the argument list preserving shared resources. Note that the
pages in the returned list are ordered in the same way as pages
in the argument list and that although pages are copied they are
not inserted into document's page sequence. Therefore in order to
be visible imported/copied pages should be appended or inserted
at a specific location within document's page sequence.
A page can be deleted using PDFDoc::PageRemove(itr) method where
itr is the PageIterator to the page that should be deleted. A PageIterator
for the given page can be obtained using PDFDoc::Find(page_num)
or using direct iteration through document's page sequence. This
is shown in the examples below:
// Remove the fifth page from the page sequence.
doc.PageRemove(doc.PageFind(5));
// Remove the third page.
PageIterator i = doc.PageBegin();
i.Next();
i.Next();
doc.PageRemove(i);
PDFDoc::PageRemove(itr) only removes the page from
document's page sequence. The page and its resources are still available
until the document is saved in 'full save mode' with 'remove unused
objects' flag or until next garbage collect operation. If you are
saving the file in 'incremental mode' the serialized document may
contain the content of the removed page.
Given the copy and delete
page operations described in previous sections it is easy to re-arrange
and sort pages. For example, the order of pages in the document
can be reversed as follows.
int page_num = doc.GetPageNumber();
for (int i=1; i<=page_num; ++i)
{
PageIterator itr = doc.PageFind(i);
Page page = itr.Current();
doc.PageRemove(itr);
doc.PagePushFront(page);
}
A page can be rotated when displayed or printed by specifying 'Rotate'
attribute in page dictionary. The value of 'Rotate' attribute is
the number of degrees by which the page should be rotated clockwise
when displayed or printed. The value must be a multiple of 90.
// Rotate the first page 90 degrees clockwise.
PageIterator itr = doc.PageBegin();
Page page = itr.Current();
Obj page_dict = page.GetSDFObj();
page_dict.Put("Rotate", Obj.CreateNumber(90));
To find out the rotation for an existing page use
the following code:
int deg = 0;
Obj rotate = page.GetRotation();
if (rotate != null)
deg = (int) rotate.GetNumber();
The crop box defines the region to which the contents of the page
are to be clipped (cropped) when displayed or printed. Unlike the
other boxes, the crop box has no defined meaning in terms of physical
page geometry or intended use; it merely imposes clipping on the
page contents. The default value is the page’s media box.
A new crop box can be specified as follows.
Obj page_dict = page.GetSDFObj();
page_dict.Put("CropBox",
Rect.CreateSDFRect(0, 0, 500, 600));
and to find out what is the crop box on an existing
page:
Rect rect = null;
Obj cb = page.GetCropBox();
if (cb != null)
{
rect = new Rect(cb);
// Crop box is:
// rect.x1, rect.y1,
// rect.x2, rect.y2
}
else
{
// The crop box is equal to the media box
}
The media box defines the boundaries of the physical medium on
which the page is to be printed. It may include any extended area
surrounding the finished page for bleed, printing marks, or other
such purposes. It may also include areas close to the edges of the
medium that cannot be marked because of physical limitations of
the output device. Content falling outside this boundary can safely
be discarded without affecting the meaning of the PDF file. A new
value for page media box can be specified as follows:
Obj page_dict = page.GetSDFObj();
page_dict.Put("MediaBox",
Rect.CreateSDFRect(0, 0, 400, 700));
or by editing the existing media box:
Rect media_box = new Rect(page.GetMediaBox());
media_box.x1 = 0;
media_box.y1 = 0;
media_box.x2 = 400;
media_box.y3 = 700;
Page content can be horizontally and vertically translated by adjusting
the media box. For example, the following code will translate
all page contents 2 inches= 72 units per inch * 2 inches to the
left.
Rect media_box = new Rect(page.GetMediaBox());
// translate the page 2 inches horizontally
media_box.x1 += 144;
media_box.x2 += 144;
media_box.Update();
PDFNet provides an advanced and easy to use API that can be used
to read, write and edit text, images, and other graphical Elements.
Because the API is very efficient, PDFNet is an good match for interactive
(e.g. PDF viewers and editors) and content extraction applications
(e.g. conversion, preflight, etc), as well as for dynamic PDF generation.
Page content is a major component of a PDF document. It represents
the visible marks on a page that are drawn by a set of PDF marking
operators. For details on PDF content streams and detailed operator
descriptions please refer to Section 3.7.1, “Content Streams,”
in the PDF
Reference Manual.
Although PDFNet SDF and Filters API provide everything that is
required to decode and parse low-level content streams using Element
API is much easier and more intuitive. In short PDFNet Element API
allows you to treat a page’s contents as a list of objects
(i.e. a display list or a sequence of Elements) rather than manipulating
sets of cryptic marking operators.
A set of marking operators from the page content stream builds
an Element (such as text, path, or an image) and the set of Elements
represents a display list.

Figure. A sequence of page marking operators
represents an Element.
Therefore PDFNet Element interface allows user to treat page contents
as a list of objects whose values and attributes can be modified.
Using Element interface applications can read, write, edit, and
create
page contents and page resources, which may contain fonts, images,
shadings, patterns, extended graphics states, and so on.
Your application may use Element methods to modify the appearance
of a page or it can create pages from scratch.
Each Element is independent of each other. Therefore every Element
encapsulates all the relevant information about itself. A text object
contains all font attributes, for instance.
Element is the concrete base class for all Elements. PDFNet supports
all content elements occurring in PDF, namely : path, text_begin,
text, text_new_line, text_end, image, inline_image, shading, form,
group_begin, group_end, marked_content_begin, and marked_content_end.
Note that some Elements such as path, text, image, inline-image,
and shading represent concrete graphical elements, whereas Elements
such as text_begin/end, text_new_line, group_begin/end, and marked_content_begin/end
don't have graphical representation but are used for logical grouping
of Element sequences or to provide meta-data associated with Element
groups.
Element hierarchy implements a composite pattern that is Element
class implements methods for all derived classes.

Figure. Element hierarchy. Only methods listed in the Element
group or base class can be invoked for the given type.
To find the type of a given Element use element.GetType() method.
It is illegal to call methods that are not related to given element
type and the behavior is undefined. For example, it is illegal to
call element.GetImageData() on a e_path element.
Note that in the above figure e_group_begin/end and e_text_begin/end
don't add any functionality to the common Element interface (i.e.
GetType()/GetGState()/GetCTM()). The main purpose of these Elements
is to mark sequences of Elements into logical groups. Element e_group_begin
corresponds to PDF 'q' operator (saveState), e_group_end corresponds
to 'Q' operator, e_text_begin corresponds to 'BT' (begin text) operator
and e_text_end corresponds to 'ET' operator.
e_text_begin initializes a text object, initializing the text matrix
and the text line matrix to the identity matrix. Because PDF text
objects cannot be nested a second e_text_begin element cannot appear
before e_text_end. A text object contains one or more text runs
(i.e. e_text elements) and new line markers (e_text_new_line elements).
e_text and e_text_new_line are not allowed outside of the text group
(i.e. outside element sequence surrounded by e_text_begin/end).
Every Element has associated CTM (current transformation matrix)
and graphics state. Element.GetCTM() returns the transformation
matrix in effect while processing the current Element. Element.GetGState()
returns associated graphics state. GState is a keeps track of a
number of style attributes used to visually define graphical Elements.
The methods available through GState class are listed below:

Figure. Graphics State.
For a detailed description of graphics state attributes refer to
section 4.3 "Graphics State" in
PDF Reference Manual.
Page content is represented as a sequence of graphical Elements
such as paths, text, images, forms, etc. The only effect of the
ordering of Elements in the display list is paint order. Elements
that occur later in the display list can obscure earlier elements.
A display list can be traversed using ElementReader as in the following
example:
void ReadDoc()
{
// Open an existing document
PDFDoc doc = new PDFDoc("in.pdf");
ElementReader reader = new ElementReader();
// Read page content on every page in the document
PageIterator itr;
PageIterator end = doc.PageEnd();
for (itr=doc.PageBegin(); itr!=end; itr.Next())
{
// Read the page
reader.Begin(itr.Current());
ProcessElements(reader);
}
}
void ProcessElements(ElementReader reader)
{
Element element;
// Traverse the page display list
while ((element = reader.Next()) != null)
{
switch (element.GetType())
{
case Element.ElementType.e_path:
{
if (element.IsClippingPath())
{}
// ...
break;
}
case Element.ElementType.e_text:
{
Matrix2D text_mtx = element.GetTextMatrix();
// ...
break;
}
case Element.ElementType.e_form:
{
reader.FormBegin();
ProcessElements(reader);
reader.End();
break;
}
}
}
} In order to begin display list traversal call reader.Begin().
reader.Next() will than return subsequent Elements until NULL/null
is returned marking the end of the display list.
Note that ElementReader works with one page at a time although
the same ElementReader may be reused to process multiple pages .
Note that PDF page display list may contain children display lists
of Form XObjects, Type3 font glyphs, and tiling patterns. A form
XObject is a self-contained description of any sequence of graphics
objects (including path objects, text objects, and sampled images),
defined as a PDF content stream. It may be painted multiple times—either
on several pages or at several locations on the same page—and
will produce the same results each time, subject only to the graphics
state at the time it is invoked. In order to open a child display
list for Form XObject call reader.FormBegin() method and to return
processing to the parent display list call reader.End(). Processing
of the form XObject display is illustrated in the following figure:

Figure. Traversing the child display list.
Note that in the above example a child display list is opened when
element with type Element.ElementType.e_form is encountered using
reader.FormBegin() method. The child display list becomes the current
display list until it is closed using reader.End(). At this point
the processing is returned to the parent display list and the next
Element returned will be the Element following the Form XObject.
Also note that sub-display lists may also have children display
lists because the Form XObjects may be nested. In the above example
support for nesting is implemented using recursion.
Analogous to Form XObject pattern display list can be opened using
reader.PatternBegin() whereas Type3 glyph display list can be opened
using reader.Type3FontBegin() method.
After reading an Element using ElementReader.Next() method it is
possible to access all graphical attributes of the Element through
its graphics state. Some applications are
more interested in changes in the graphics state than attribute
values. For example, a transition from one Element to another may
not involve changes in the graphics state or there may be changes
only to couple of attributes. In these cases it is not efficient
to make memeberwise comparisons between the old and the current
graphics state.
PDFNet offers an efficient and easy to use API that can be used
to enumerate the list of changes between subsequent Elements.
The list of changes in graphics state can be traversed using ElementReader.ChangesBegin/End()
method as in the following example:
GSChangesIterator itr = reader.ChangesBegin();
GSChangesIterator end = reader.ChangesEnd();
for (; itr != end; itr.Next())
{
switch(itr.Current())
{
case GState.GStateAttribute.e_transform:
// Get transform matrix for this element.
// Unlike path.GetCTM() that returns full
// transformation matrix gs.GetTransform()
// returns only the transformation matrix
// that was installed for this element (a
// cm operator preceding this Element).
// gs.GetTransform();
break;
case GState.GStateAttribute.e_line_width:
// gs.GetLineWidth();
break;
case GState.GStateAttribute.e_line_cap:
// gs.GetLineCap();
break;
case GState.GStateAttribute.e_line_join:
// gs.GetLineJoin();
break;
case GState.GStateAttribute.e_miter_limit:
// gs.GetMiterLimit();
break;
case GState.GStateAttribute.e_dash_pattern:
break;
// Etc.
}
}
}
It is also possible to query ElementReader for changes in a given
attribute:
if (reader.IsChanged(
GState.GStateAttribute.e_line_width))
{
// line width was changed.
}
Note that the list of modified attributes is accumulated
when calling ElementReader.Next(). To clear the list of modified
attributes use ElementReader.ClearChangeList() method. A call to
ClearChangeList() serves as a marker in the display list from which
further changes in the graphics state are tracked.
New page content can be added to an existing page or a blank
new page using ElementBuilder and ElementWriter. ElementBuilder
is used to instantiate Element(s) that
can be written to one or more pages using ElementWriter:

Figure. Adding new content to a page.
The following example illustrates how to write page content to
a new document.
PDFDoc doc = new PDFDoc();
// ElementBuilder is used to build new Element objects
ElementBuilder f = new ElementBuilder();
// ElementWriter is used to write Elements to the page
ElementWriter writer = new ElementWriter();
// Start a new page
// Position an image stream on several places on the page
Page page = doc.PageCreate();
// Begin writing to this page
writer.Begin(page);
// Attach ElementBuilder to the page
f.Begin(page);
// Import an Image that can be reused multiple
// times in the document or multiple times on the
// same page.
StdFile img_file = new StdFile("peppers.jpg",
StdFile.OpenMode.e_read_mode);
FilterReader img_data = new FilterReader(img_file);
Image img = Image.Create(doc.GetSDFDoc(),
img_data,
Image.ImageCompression.e_jpeg,
400, 600, 8,
ColorSpace.CreateDeviceRGB());
Element element = f.CreateImage(img,
new Matrix2D(200, -145, 20, 300, 200, 150));
writer.WritePlacedElement(element);
GState gstate = element.GetGState();
// Use the same image (just change its matrix)
gstate.SetTransform(200, 0, 0, 300, 50, 450);
writer.WritePlacedElement(element);
// Use the same image (just change its matrix)
writer.WritePlacedElement(
f.CreateImage(img, 300, 600, 200, -150));
// save changes to the current page
writer.End();
// Add a new page to the document sequence
doc.PagePushBack(page);
// Start a new page
page = doc.PageCreate();
writer.Begin(page);
f.Begin(page);
// Construct and draw a path object using
// different GState attributes
f.PathBegin();
f.MoveTo(306, 396);
f.CurveTo(681, 771, 399.75, 864.75, 306, 771);
f.CurveTo(212.25, 864.75, -69, 771, 306, 396);
f.ClosePath();
// path is now constructed
element = f.PathEnd();
element.SetPathFill(true);
// Set the path color space and color
gstate = element.GetGState();
gstate.SetFillColorSpace(
ColorSpace.CreateDeviceCMYK());
gstate.SetFillColor(
new ColorPt(1, 0, 0, 0)); // cyan
gstate.SetTransform(
0.5, 0, 0, 0.5, -20, 300);
writer.WritePlacedElement(element);
// Draw the same path using a different
// stroke color.
// This path is should be filled and stroked
element.SetPathStroke(true);
gstate.SetFillColor(
new ColorPt(0, 0, 1, 0)); // yellow
gstate.SetStrokeColorSpace(
ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(new ColorPt(1, 0, 0)); // red
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 300);
gstate.SetLineWidth(20);
writer.WritePlacedElement(element);
// Draw the same path with with a given dash pattern
// This path is should be only stroked
element.SetPathFill(false);
gstate.SetStrokeColor(new ColorPt(0, 0, 1)); // blue
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 0);
double[] dash_pattern = {30};
gstate.SetDashPattern(ref dash_pattern, 0);
writer.WritePlacedElement(element);
writer.End(); // save changes to the current page
doc.PagePushBack(page);
doc.Save("out.pdf", Doc.SaveOptions.e_remove_unused);
Note that once the Element
is instantiated using ElementBuilder you have full control over
its properties and its graphics state.
Page content can also come from existing pages. For example, you
can use ElementReader to read paths, text, and images from existing
pages and copy them to the current page. Note that along the way
you can fully modify Element properties and its graphics state.
This is a basis of page content editing that will be discussed in
the next section. The following example copies all Elements except
images from an existing page and changes text color to blue:
ElementWriter writer = new ElementWriter();
ElementReader reader = new ElementReader();
Element element;
reader.Begin(doc.PageBegin().Current());
Page new_page = doc.PageCreate(new Rect(0, 0, 612, 794));
doc.PagePushBack(new_page);
writer.Begin(new_page);
while ((element = reader.Next()) != null)
{
if (element.GetType() == Element.ElementType.e_text)
{
// Set all text to blue color.
GState gs = element.GetGState();
gs.SetFillColorSpace(
ColorSpace.CreateDeviceRGB());
gs.SetFillColor(new ColorPt(0, 0, 1));
}
else if (element.GetType()
== Element.ElementType.e_image)
{
// remove all images
continue;
}
writer.WriteElement(element);
}
writer.End();
reader.End();
A PDF document may optionally display a document outline on the
screen, allowing the user to navigate interactively from one part
of the document to another. The outline consists of a tree-structured
hierarchy of Bookmarks (sometimes called outline items), which serve
as a 'visual table of contents' to display the document’s
structure to the user.
Each Bookmark has a title that appears on screen, and an Action
that specifies what happens when a user clicks on the Bookmark.
The typical Action for a user-created Bookmark is to move to another
location in the current document, although any Action can be specified.
Although it is possible to work with outline items using SDF/Cos
API (See section 8.2.2 'Document Outline' in PDF Reference Manual
for more details), this work is simplified using PDFNet which provides
a high-level utility class PDF::Bookmark.
You can use Bookmark.GetNext(), Bookmark.GetPrev(), Bookmark.GetFirstChild
() and Bookmark.GetLastChild () in order to navigate the whole outline
tree.
This is shown in the following code snippet.
// C# Sample:
// Prints out the outline tree to the standard output
void PrintIdent(Bookmark item)
{
int ident = item.GetIdent() - 1;
for (int i=0; i < ident; ++i)
Console.Write(" ");
}
void PrintOutlineTree(Bookmark item)
{
for (; item.IsValid(); item=item.GetNext())
{
PrintIdent(item);
Console.WriteLine("{0:s}{1:s}",
(item.IsOpen() ? "- " : "+ "), item.GetTitle());
if (item.HasChildren())
{
// Recursively print children sub-trees
PrintOutlineTree(item.GetFirstChild());
}
}
}
static void Main(string[] args)
{
PDFDoc doc = new PDFDoc("../../../Data/out1.pdf");
doc.InitializeSecurityHandler();
Bookmark root = doc.GetFirstBookmark();
PrintOutlineTree(root);
}
Note that the root Bookmark was obtained
using PDFDoc.GetFirstBookmark(). If the GetFirstBookmark() returns
a Bookmark that is not valid (i.e. GetFirstBookmark().IsValid()
return false) the document has no outline tree.
A new outline three can be created as follows:
PDFDoc doc("../Data/in.pdf");
doc.InitializeSecurityHandler();
Bookmark myitem = Bookmark::Create(doc, "My Item");
doc.AddRootBookmark(myitem);
Sub-items can be added using
Bookmark.AddChild(…) method:
Bookmark sub_item = myitem.AddChild("My Sub-Item");
myitem.AddChild("My Sub-Item 2");
Note that a Bookmark can be
associated with different kinds of Actions. The most common action
is to move to another location in the current document. This type
of Actions is called Destination Action (See section 8.2.1 'Destinations'
in PDF Reference Manual for more details). The following code creates
a new page Destination and sets the Bookmark’s action:
// The following example creates an 'explicit' destination
Destination dest = Destination::CreateFit(*doc.PageBegin());
Action action= Action::Create(dest);
myitem.SetAction(action);
Using PDFNet it is also possible to quickly create
‘named’ destinations (see section 8.2.1 'Destinations'
in PDF Reference for more details). Named destinations have an advantage
over explicit destinations because they allow the location of the
destination to change without invalidating existing link(s).
To create a named destination pass in the key under which the destination
will be stored in Action::Create(…) method:
Action blue_action = Action::Create("blue1",
Destination::CreateFit(*doc.PageBegin() );
Bookmarks class also allows you to quick find and
Bookmarks based on the title text. For example, the following code
snippet looks for a Bookmark called “foo” and then removes
it from the outline tree:
Bookmark foo = doc.GetFirstBookmark().Find("foo");
if (foo.IsValid())
{
foo.Delete();
}
Bookmark API allows you to set and change
any property on outline items including title text, action, color,
and formatting. Color and other formatting can help readers get
around more easily in large PDF documents. The following code adjusts
color and formatting properties on three Bookmark items:
red.SetColor(1, 0, 0);
green.SetColor(0, 1, 0);
// use bold font for green title text
green.SetFlags(2);
blue.SetColor(0, 0, 1);
// use bold and italic font for blue title text
blue.SetFlags(3);
An interactive form (sometimes referred to as an AcroForm) is a
collection of fields such as text boxes, checkboxes, radio buttons,
drop-down lists, pushbuttons, etc. for gathering information interactively
from the user. A PDF document may contain any number of Fields appearing
on any combination of pages, all of which make up a single, global
interactive form spanning the entire document. PDF forms are similar
to HTML forms but there are some important differences:
- Unlike HTML pages, a PDF document has a single, global interactive
form spanning the entire document.
- In PDF the field and value appearance can be completely customized.
Although field appearances give incredible customization power
to PDF forms, developers need to learn to work with forms where
filed value and appearance are two different entities.
- PDF supports combo boxes with text editing.
- In PDF it is possible to associate fields with different kinds
of Actions (or Action chains).
PDFNet fully supports reading, writing, and editing PDF forms and
provides many utility methods so that work with forms is simple
and efficient. Using PDFNet forms API arbitrary subsets of form
fields can be imported or exported from the document, new forms
can be created from scratch, and the appearance of existing forms
can be modified.
The form shown in the following figure below consists of a number
of Fileds:

Every field has its name and value, as well as its annotation appearance.
In PDFNet Fields are accessed through FieldIterator-s.
For example, the list of all Fields present in the document can
be traversed using the following code snippet:
FieldIterator itr = doc.InteractiveFieldBegin();
FieldIterator end = doc.InteractiveFieldEnd();
for(; itr != end; itr.Next())
{
Field field = itr.Current();
Console.WriteLine("Field name: {0}",field.GetName());
}
You can also search for a given filed by
name:
// Search for a specific field
FieldIterator itr = doc.InteractiveFieldFind("name");
if (itr != doc.InteractiveFieldEnd())
{
Field field = itr.Current();
Console.WriteLine("Field {0} was found.",
field.GetName());
}
else {
Console.WriteLine("Field was not found.");
} If a given
filed name was not found or if the end of the field list was reached
the iterator will be equal to doc.InteractiveFieldEnd().
If you have a valid iterator you can access the Filed using Current()
method (or dereference operator in C/C++);
Field field = itr.Current(); // C#
Field field = *itr; // C++
PDF offers seven different field types. Each type of form field
is used for a different purpose, and they have different properties,
appearances, options, and actions that can be associated with the
fields. In this section, we will explain how to create all the seven
field types and some attributes specific to each one.
Common field types are text-box, checkbox, radio-button, combo-box,
and push-button. To find out the type of the Field use Field.GetType()
method:
Field.FieldType type = field.GetType();
switch(type)
{
case Field.FieldType.e_button:
Console.WriteLine("Button");
break;
case Field.FieldType.e_text:
Console.WriteLine("Text");
break;
case Field.FieldType.e_choice:
Console.WriteLine("Choice");
break;
case Field.FieldType.e_signature:
Console.WriteLine("Signature");
break;
}
Regardless of which field type you create you need to provide a
Filed name:
Field myfiled = doc.InteractiveFieldCreate("address",
Field.FieldType.e_text);
Under most circumstances, all field names must be unique. If you
have a field you name as "address" and you create a second
field you likewise call "address", you cannot supply different
data in the two fields.
Field names can use alpha characters, numbers, or both to identify
the field. All field names are case-sensitive. For example, you
can use names such as empFirstName, empSecondName, empNumber, and
so on for a group of fileds that are related to the same concept
(in our sample employee entity).
Another means of naming fields is to use a parent and child name.
For example, you can name above fileds as follows: employee.name.first,
employee.name.second, employee.number. The period (i.e. a decimal
point) separates the parent from the child.
This naming convention is not only useful for organizing purposes
but is better suitable on automatic operations on Fileds.
In PDFNet Field.GetName() returns a string representing the fully
qualified name of the field (e.g. "employee.name.first").
To get the child name (i.e. "first") use Field.GetPartialName()
method.
After the new Field is created you can add it to a given Page(s)
using Page.CreateWidget(Filed, Rect) metod:
page.CreateWidget(name, new Rect(50, 550, 350, 600));
The second argument represents the rectangle where the widget annotation
should be placed on the page.
Form Fields can be populated using Field.SetValue() method:
field.SetValue(Obj.CreateString("New Value"));
// Regenerate appearance stream.
field.RefreshAppearance();
Note that after modifying Field's value we refreshed its appearance
stream. In PDF Filed's value and appearance are two different entities
so if you don't call RefreshAppearance() the initial value on PDF
page will be unchanged (e.g. it may have the old value or will be
blank).
Another approach used in low-end PDF libraries is to let the PDF
viewer automatically pre-generate appearance streams by setting
'NeedAppearances' flag in AcroForm dictionary:
doc.GetAcroForm()->Put("NeedAppearances",
new Bool(true)); This will force viewer application
to auto-generate appearance streams every time the document is opened.
This method is not reliable because Acrobat does not always generate
appearance streams correctly. Another disadvantage it that the user
will always be prompted to save the document even if the document
was not modified.
In addition to form appearance auto-generation and refresh, PDFNet
provides several other means to deal with appearance streams. Some
of these techniques are discussed in the section on Advanced Forms.
Filed.GetValue() returns the field's value (an SDF::Obj), whose
type varies depending on the field type. For example, a text field
might be associated with string values (SDF::Str strings) or null:
if (type == Field.FieldType.e_text
&& field.GetValue())
{
Console.WriteLine("Field value: {0}",
field.GetValue().GetStr());
}
else
{
Console.WriteLine("Field is blank");
}
Form 'flattening' refers to the operation that changes active
form fields into a static area that is part of the PDF document,
just like the other text and images in the document. A completely
flattened PDF form does not have any widget annotations or interactive
fields. Using Field.Flatten() or Page.FlattenField() method
it is possible to merge individual field appearances with the page
content. PDFNet also allows you to flatten all forms in the document
in a single function call (PDFDoc.FlattenFields()). Note
that it is not possible to undo Field.Flatten() operation. An alternative
approach to set the field as read only, that can be programatically
reversed, is using Field.SetFlag(Field::e_read_only, true) method.
The security mechanism for the high-level document works in the
same way as for the SDF document. To secure a document use PDFDoc::SetSecurityHandler()
method and to open secured document call PDFDoc::InitializeSecurityHandler()
after opening a document. In order to provide a GUI feedback you
can optionally derive a class from StdSecurityHandler class. For
details on how to secure and read encrypted PDF documents refer
to section on SDF security.
The following table lists security permissions available through
StdSecurityHandler (a standard security handler for PDF documents)
| Permission |
Description |
| e_all |
All permissions are granted. |
| e_doc_open |
A permission to open a document. |
| e_doc_secure |
A permission to change security settings on a document. |
| e_doc_modify |
Modify the contents of the document. |
| e_print |
Print the document. |
| e_print_high |
Print the document to a representation from which a faithful
digital copy of the PDF content could be generated. When this
permission is not set, printing is limited to a low level representation
of the appearance, possibly of degraded quality. |
| e_extract_content |
Copy or otherwise extract text and graphics from the document. |
| e_mod_annot |
Add or modify text annotations, fill in interactive form fields. |
| e_fill_forms |
Fill in existing interactive form fields (including
signature fields). |
| e_access_support |
Extract text and graphics (in support of accessibility
to disabled users or for other purposes). |
| e_assemble_doc |
Assemble the document (insert, rotate, or delete
pages and create bookmarks or thumbnail images), even if e_doc_modify
is not set. |
PDFNet uses standard exception mechanism in C++ or .Net languages
(C#, VB, Java) to report illegal program states and to provide a
transparent and clean way to handle errors.
C++ Example:
try
{
PDFDoc doc("file.pdf");
doc.PageBegin();
// ...
}
catch (Exception& e)
{
cout << e << endl;
}
catch (...)
{
cout <<"Unknown Exception" << endl;
}
C# Example:
try
{
PDFDoc doc = new PDFDoc("file.pdf");
doc.PageBegin();
// ...
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message); }
This section is being reviewed by selected clients and is currently
not publicly available. Please visit the page at a later time or
contact support at pdftron.com for up-to-date documentation.
|