- ProductsGreat pdf developer solutions
- SupportDeveloper 2 developer support
- ResourcesCommunity & developer resources
- Why PDFTronTrusted pdf experts with great solutions
- About UsThe story behind the company
Home // Products // PDFNet SDK // Documentation
Copyright 2001-2010 by PDFTron Systems, Inc. All rights reserved. All information contained herein is the property of PDFTron Systems, Inc. No part of this publication (whether in hardcopy or electronic form) may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of the PDFTron Systems, Inc. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by PDFTron Systems, Inc. PDFNet SDK is available under license and may only be used or copied in accordance with the terms of such license.
Processing Forms, Type3 glyphs, tiling patterns, Processing changes in Graphics State,Paths, Text, Fonts, Images, Fonts, Shadings, Patterns
Paths, Text, Font Embedding, Type3 Fonts, Images, Shading, Patterns
Editing Page Content
FDF Merge and FDF Extract
PDFNet is a high-quality, industry-strength PDF library meeting requirements of the most demanding and diverse applications. Using PDFNet you can write stand-alone, cross-platform and reliable commercial applications that can read, write, and edit PDF documents.
PDFNet is offered on a wide range of platforms (e.g. Windows, Mac, Linux, Solaris, etc) and programming environments (C/C++, C#, VB, J#, and other .Net languages).
PDFNet API namespace is divided into PDF, SDF, and Filters namespace.

Figure. PDFNet API Modules.
PDF is a set of high-level API's that can be used to manipulate high-level PDF constructs such as pages, interactive forms, bookmarks, graphical elements on the page, etc.
SDF API is a powerful low-level API that can be used to manipulate every aspect of a PDF document. In order to use SDF, you need to be familiar with PDF file structure (documented in the PDF Reference Manual). Using this powerful API, it is possible to implement any functionality that is not present in the PDF API.
Filters namespace deals with various compression and encryption schemes used in PDF. Unless you are planning to implement a custom encryption or compression scheme on top of PDF, you only need very basic knowledge of the Filters API.
In this section we present the basic structure of a PDF document. For details please refer to the PDF Reference Manual. Below is a listing of a very simple PDF document that displays "Hello World" string on a single page.
0000 %PDF-1.4 0001 1 0 obj << 0002 /Parent 5 0 R 0003 /Resources 3 0 R 0004 /Contents 2 0 R 0005 >> 0006 endobj 0007 2 0 obj 0008 << 0009 /Length 51 0010 >> 0011 stream 0012 BT 0013 /F1 24 Tf 0014 1 0 0 1 260 330 Tm 0015 (Hello World)Tj 0016 ET 0017 endstream 0018 endobj 0019 3 0 obj 0020 << 0021 /ProcSet [/PDF/Text] 0022 /Font <</F1 4 0 R >> 0023 >> 0024 endobj 0025 4 0 obj << 0026 /Type /Font 0027 /Subtype /Type1 0028 /Name /F1 0029 /BaseFont/Helvetica 0030 >> 0031 endobj 0032 5 0 obj 0033 << 0034 /Type /Pages 0035 /Kids [ 1 0 R ] 0036 /Count 1 0037 /MediaBox [0 0 612 714] 0038 >> 0039 endobj 0040 6 0 obj 0041 << 0042 /Type /Catalog 0043 /Pages 5 0 R 0044 >> 0045 endobj 0046 xref 0047 0 7 0048 0000000000 65535 f 0049 0000000009 00000 n 0050 0000000103 00000 n 0051 0000000204 00000 n 0052 0000000275 00000 n 0053 0000000361 00000 n 0054 0000000452 00000 n 0055 trailer 0056 << 0057 /Size 7 0058 /Root 6 0 R 0059 >> 0060 startxref 0061 532
A PDF file consists of four sections:
Note that the objects refer to each other using a notation like "5 0 R". The "R" stands for reference and it uses the two preceding numbers to know which object and revision we wish to reference.
Therefore, the file body consists of a collection of objects that refer to each other forming an object graph. We could represent the "Hello World" sample file using the following abstract graph representation.

Figure. Object Graph.
Each object in the graph is represented with an ellipse and the object cross references as arrows.
All PDF files must have a "Root" node. It must reference a "Catalog" node which must reference a "Pages" node. The "Pages" node further branches and points to all the pages in the document. Note that a "Pages" node points to a group of pages whereas the "Page" node represents a single page.
The "Page" node references the page "Contents" and the page "Resources". The resource dictionary in turn references "Fonts" used on the page. The resource dictionary can reference many other resource types such as Color Spaces, Patterns, Shadings, Images, Forms, etc. The page contents stream contains markup operators used to draw the page.
All PDF files obey this basic object structure to represent a PDF document.
Before going into details of PDFNet SDF/COS object model, we should review the basics. For a detailed description of the SDF syntax and semantics, please refer to Chapter 3 (Syntax) of the PDF Reference Manual.
In PDF there are five atomic objects:
| Object Type | Description | Samples |
| Number | PDF provides two types of numeric object: integer and real. | 1.03 612 |
| Bool | Boolean objects are identified by the keywords true and false. | true false |
| Name | A name object is an atomic symbol uniquely defined by a sequence of characters. Names always begin with "/" and can contain letters and numbers and a few special characters. | /Font /Info /PDFNet |
| String | Strings of bytes are in PDF enclosed in "(" and ")" | (Hello World!) |
| Null | The null object has a type and value that are unequal to those of any other object. Usually refers to a missing object. | null |
Also, there are two compound objects:
| Object Type | Description | Samples |
| Array | An array object is a one-dimensional collection of objects arranged sequentially. Unlike arrays in many other computer languages, PDF arrays may be heterogeneous; that is, an array's elements may be any combination of numbers, strings, dictionaries, or any other objects, including other arrays. | [] [ true /Name ] [ (Hello) [1] false 54.3 /Font ] |
| Dictionary | A dictionary object is a map containing pairs of objects, known as the dictionary's entries. The first element of each entry is the key and the second element is the value. The key must be a name. The value can be any kind of object, including another dictionary. | <</key /value >> << /first (Str Value) /second [true false] /third << /yes /no >> >> |
| Stream | A stream is essentially a dictionary followed by a binary stream. PDF streams are always indirect objects so they can be shared. | 1 0 obj << /Length 144 >> stream ........... endstream endobj |
Objects can be arbitrarily nested using the dictionary and array compounding operations.
All of the objects in the above tables are "direct objects" because they are not surrounded by "obj" and "endobj" keywords. The body of the PDF document is actually made up of a sequence of "indirect objects". An indirect object is created by taking a single direct object (atomic or compound) and enclosing it with the "1 0 obj" and "endobj" keywords.
Note that, since direct objects are not numbered, they can't be shared. However, because indirect objects are numbered and can be referenced by other objects, they can be shared (i.e. referenced by more than one other object).
In the above PDF example, the object '3 0 obj' is an indirect object because "obj" and "endobj" keywords wrap a dictionary object containing two entries.
3 0 obj << /ProcSet [/PDF /Text] /Font << /F1 4 0 R >> >> endobj
"ProcSet" key is mapped to an array which is a direct object containing atomic direct objects. In a similar way, the "Font" key is mapped to a direct dictionary. On the other hand, "F1" in the inner dictionary is mapped to an indirect object with the object number 4 and the generation number 0. Because the Font object is indirect, the same font resource can be shared across many different pages.
Real life PDF documents are much more complex than the "Hello World" sample from the previous section. Streams in a PDF document can be compressed and encrypted, objects can form complex networks, and in PDF 1.5 parts of the object graph can be compressed and embedded in so called 'object streams'. All this makes manual editing of PDF files extremely difficult or impossible. The good news is that PDFTron Systems released a utility software called CosEdit that can be used to browse, and edit PDF at object level with unprecedented ease and control. PDFNet also provides a full SDF/COS level API making it very easy to read, write, and edit PDF and FDF at the atomic level. Furthermore, PDFNet provides a high-level API that can be used to read, write, and edit PDF documents in terms of pages, bookmarks, graphical primitives, etc.
SDF (Structured Document Format) and COS (Carousel Object System; Carousel was a codename for Acrobat 1.0) are synonyms for PDF low-level object model. SDF is the acronym used in PDFNet, whereas COS is a legacy word used in Acrobat SDK.
In many ways, SDF is to PDF what XML and DOM is to SVG (Scalable Vector Graphics). SDF/COS object system provides the low-level object type and file structure used in PDF files. PDF documents are graphs of SDF objects. SDF objects can represent document components such as bookmarks, pages, fonts, and annotations, etc.
PDF is not the only document format built on top of SDF/COS. FDF (Form Data Format) and PJTF (Portable Job Ticket Format) are also built on top of SDF/COS.
The SDF layer deals directly with the data that is in a PDF document. The data types are referred to as SDF objects. There are eight data types found in PDF files. They are arrays, dictionaries, numbers, boolean values, names, strings, streams, and a null object. PDFNet implements these objects as shown in the following graph:

Figure. SDF Obj Hierarchy.
Obj is the base class for all SDF objects. Obj hierarchy implements a composite pattern so you can invoke a member function of any derived object through the base class interface (i.e. Obj implements methods for all derived classes). This is illustrated in the following C# sample code.
Doc doc = new Doc("in.pdf");
// Get the trailer
Obj trailer = doc.GetTrailer();
// Get the info dictionary.
Obj info = trailer.Get("Info").Value();
// Replace the Producer entry
info.Put("Producer", Obj.CreateString("PDFNet"));
// Create a custom inline dictionary within
// Info dictionary
Obj custom_dict = Obj.CreateDict();
info.Put("My Direct Dict", custom_dict);
// Add some key/value pairs
custom_dict.Put("My Number", Obj.CreateNumber(100));
Obj my_array = Obj.CreateArray();
custom_dict.Put("My Array", my_array);
// Create a custom indirect array within Info dictionary
Obj custom_array = doc.CreateIndirectArray();
info.Put("My Indirect Array", custom_array);
// Create indirect link to root
custom_array.PushBack(trailer.Get("Root").Value());
// Embed a custom stream (file my_stream.txt).
StdFile embed_file = new StdFile("myfile.txt",
PDFNet.StdFile.OpenMode.e_read_mode);
FilterReader mystm = new FilterReader(embed_file);
custom_array.PushBack(doc.CreateIndirectStream(mystm));
doc.Save("out.pdf", 0, "%PDF-1.4"); // Save PDF
If a member function is not supported on a given object (e.g. if you are invoking obj.GetName() on a Bool object), an Exception will be thrown. Learn more about PDFNet exception handling under the Error handling section.
In order to find out type-information at run-time, use obj.GetType() or obj.Is???() methods (where ??? represent the Type in question; e.g. Array, Number, Bool, Str, Dict, Stream). Most of the time the object type can be inferred from PDF/FDF specification. For example, when you call doc.GetTrailer(), you can assume that the returned object is a dictionary object because this is mandated by PDF specification. If the object is not a dictionary, an exception will be thrown when a dictionary method is called on the object. This way the code is both efficient and elegant since unnecessary type casts and type checks are not required. In case there is an ambiguity in PDF/FDF specification, you can use GetType() or Is???() methods.
As mentioned in the previous section, SDF objects can be either direct or indirect. Direct objects can be created using Obj.Create???() methods. The following example illustrates how to create a direct number/name object inside Dict/Array object.
// Create a direct number/name/dict
Obj direct_num = Obj.CreateNumber(100);
Obj direct_name = Obj.CreateName("My Name");
Obj direct_dict = Obj.CreateDict();
// you can insert newly created direct objects
// into other container objects.
direct_dict.Put("My Number", direct_num);
doc.GetRoot().Put(My Dict, direct_dict);
doc.GetRoot().Put(My Name, direct_name);
New indirect objects can be created using doc.CreateIndirect???() methods on a SDF document. The following code shows how to create a new Number and new Dictionary indirect object:
Obj mynumber = doc.CreateIndirectNumber(100); Obj mydict = doc.CreateIndirectDict();
PDFNet SDF provides many utility methods that can be used to efficiently traverse SDF object graph. Here is an example on how to get to document's page root:
Obj pages = doc.GetTrailer()
.Get("Root").Value()
.Get("Pages").Value();
Note that because PDF specification mandates that "Root" is always a dictionary, we can directly reference the "Pages" object using a Get("key") . If "Root" was not a dictionary object, an exception would be thrown.
In order to retrieve an object that may or may not be present in a dictionary, use dict.Find("key") method. For example,
DictIterator itr = dict.Find("My Key");
Obj my_value = null;
if (itr != dict.DictEnd())
{
my_value = itr.Current().Value();
// ...
}
Note that dict.Find("key") returns a DictIterator object. If the given key is not present in the dictionary, DictIterator would be equal to dict.DictEnd(), otherwise DictIterator refers to the key-value pair that was found.
You can use DictIterator in order to traverse key-value pairs within a dictionary:
DictIterator itr = dict.DictBegin();
DictIterator end = dict.DictEnd();
while (itr!=end)
{
// itr.Current().Key();
// itr.Current().Value();
itr.Next();
}
In order to retrieve objects from an Array object, use array.GetAt(idx) method:
// C++ sample
Obj* obj;
for (int i=0; i<array->Size(); ++i)
{
obj = array->GetAt(i);
// ...
}
Obj hierarchy also implements a visitor pattern, so you can derive objects from the ObjVisitor class. A visitor object can traverse the Obj graph structure and perform certain operations on graph nodes based on their type. (read more about this later).
Using PDFNet API, it is easy to create new links to direct or indirect objects. In the previous section we mentioned that a new direct object can be created using Obj.Create???() methods, whereas a new indirect object can be created using Doc.CreateIndirect???() methods. The remaining question is how to create links/references linking indirect objects? In PDFNet, creating new indirect references is very simple and transparent:
// C# sample
// Create indirect dictionary containing (/Key, /Foo)
// entry that will be shared.
Obj shared_dict = doc.CreateIndirectDict();
shared_dict.Put("Key", Obj.CreateName("Foo"));
// shared_dict.IsIndirect() returns true ...
// Get document's info dictionary.
Obj trailer = doc.GetTrailer();
Obj info = trailer.Get("Info").Value();
// Add indirect reference to 'shared_dict'.
info.Put("MyDict", shared_dict);
// Get document's root dictionary.
Obj root = trailer.Get("Root").Value();
// Add a second indirect reference to 'shared_dict'.
root.Put("MyDict", shared_dict);
Note that indirect links are created in exactly the same way as direct links, (i.e. using dict.Put("key", obj), array.Insert(idx, obj), and array.PushBack/PushFront(obj) methods).
Multiple objects can refer to the same object, however the shared object must be indirect (i.e. it was created using Doc.CreateIndirect???() or obj.IsIndirect() returns true)
Because in PDF creating multiple links to direct objects is not allowed, PDFNet will throw an exception when you attempt to create multiple links/references to the same direct object. This is shown below:
// C# sample
try
{
// Create a direct Boolean object
Obj direct_obj = Obj.CreateBool(true);
Obj trailer = doc.GetTrailer();
Obj info = trailer.Get("Info").Value();
// Insert the direct object into info dictionary
info.Put("Link1", direct_obj);
Obj root = trailer.Get("Root").Value();
// Attempt to create a second link to direct_obj.
// This will throw an exception. If you want to
// share objects create them using
// Doc::CreateIndirect???() methods
root.Put("Link2", direct_obj);
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message);
}
In addition to basic objects mentioned so far, PDF also supports stream objects. A stream object is essentially a dictionary with an attached binary stream. In PDFNet, all methods that apply to dictionaries apply to streams as well.
In addition to methods provided by Dict, streams provide an interface used to access an associated data-stream. You can use stm.GetDecodedStream() to get decoded data or stm.GetRawStream() to get the raw data without any Decode filters applied. GetRawStreamLength() returns the length of the raw data-stream. This number is the same as the one stored under “Length” key in the stream dictionary.
PDFNet supports all compressions and encryption schemes used in PDF and the access to decoded data is transparent. The following code decodes and extracts the contents of a given stream to an external file:
// C# sample
Obj stream = ...
Filter dec_stm = stream.GetDecodedStream();
FilterReader reader = new FilterReader(dec_stm);
// Write decoded data to the output file.
// First open the file
StdFile out_file = new StdFile("out.bin",
PDFNet.StdFile.OpenMode.e_read_mode);
FilterWriter writer = new FilterWriter(out_file);
writer.WriteFilter(reader);
writer.Flush();
For a more complete discussion on PDFNet Filters see PDFNet Streams and Filters.
The overview of SDF object model in not complete without mentioning SDF Doc. SDF Doc brings together document security, document utility methods, and all SDF objects.
A SDF document can be created from scratch using a default constructor:
Doc mydoc = new Doc(); Obj trailer = mydoc.GetTrailer();
SDF document can be also created from an existing file (e.g. an external PDF document):
Doc mydoc = new Doc("in.pdf");
Obj trailer = mydoc.GetTrailer();
or from a memory buffer or some other Filter/Stream such as a HTTP Filter connection:
MemoryFilter memory = .... Doc mydoc = new Doc(memory); Obj trailer = mydoc.GetTrailer();
Finally SDF document can be accessed from a high-level PDF document as follows:
PDFDoc doc = new PDFDoc("in.pdf");
Doc mydoc = doc.GetSDFDoc();
Obj trailer = mydoc.GetTrailer();
Note that the examples above used doc.GetTrailer() in order to access document trailer, the starting SDF object (root node) in every document. Following the trailer links, we can visit all low-level objects in a document (e.g. all pages, outlines, fonts, etc).
SDF Doc also provides utility methods used to import objects and object collections from one document to another. These methods can be useful for copy operations between documents such as a high-level page merge and document assembly.
One of the basic building blocks of a PDF document is a SDF stream object. For example, in a PDF document all page content, images, embedded fonts, and files are represented using object streams that can be compressed and encrypted using various Filter chains. See "Stream Objects" and "Filters" chapters in the PDF Reference Manual for more details.
PDFNet supports an efficient and flexible architecture for processing stream using Filter pipelines.
A Filter is an abstraction of a sequence of bytes, such as a file, an input/output device, an inter-process communication pipe, or a TCP/IP socket. A filter can also perform certain transformations of input/output data (e.g. data compression/decompression, color conversion, etc.)
PDFNet provides a generic input/output filter for external files using StdFile class. Use the StdFile class to read from, write to, open, and close files on a file system. For example,
StdFile myfile = new StdFile("in.jpg",
StdFile.OpenMode.e_read_mode);
opens an external image file for reading. StdFile buffers input and output for better performance. Although it is possible to read input data directly through the Filter interface (StdFile is a Filter), it is more convenient to attach a FilterReader to the filter and then read data through FilterReader interface:
// C#
FilterReader reader = new FilterReader(myfile);
int bytes;
while((bytes = reader.Read(buffer)) != 0)
{
}
Data associated with SDF stream objects can be accessed using Stream.GetRawStream() or Stream.GetDecodedStream() methods.
// C# sample
void Extract(Obj stream)
{
Filter dec_stm = stream.GetDecodedStream();
FilterReader reader = new FilterReader(dec_stm);
int bytes;
while((bytes = reader.Read(buffer)) != 0)
{
}
}
Stream.GetRawStream() creates a Filter used to extract raw data as it appears in serialized document (or a decrypted version of the stream if the document is secured). Stream.GetDecodedStream() creates a Filter pipeline and returns the last filter in the chain. For example, a given stream may be compressed using JPEG (DCTDecode) compression and encoded using ASCII85 into a ASCII stream. When GetDecodedStream() is invoked on this SDF stream, it will return the last filter in the chain that contains three filters (file segment input filter, DCTDecode, and ASCII85Decode Filter respectively). Data extracted from the returned Filter will be raw image data (i.e. RGB triplets).
It is possible to iterate through the Filter chain using Filter.GetAttachedFilter() method. For example, the following code snippet prints out all the Filter names in the filter chain.
// C# sample
Filter attached_flt;
Filter cur_flt = dec_stm;
while (attached_flt = cur_flt.GetAttachedFilter())
{
Console.WriteLine(cur_flt.GetName());
cur_flt = attached_flt;
}
It also possible to construct new and edit existing filter chains using Filter.AttachFilter(flt) method.
In order to open an external file Filter for writing using PDFNet use StdFile class is follows:
StdFile myfile = new StdFile("out.txt",
StdFile.OpenMode.e_write_mode);
After the output file filter/stream is opened you can output data using FilterWriter class:
FilterWriter writer = new FilterWriter(myfile);
writer.WriteString("Hello World");
writer.Flush();
Output filters can also be chained in order to compress and transform data (e.g. data encoding, color-conversion, image resampling, etc). The following sample creates an output filter chain that compresses data using Flate compression method and than encodes compressed data using ASCII85. The last filter in the chain is the output file that will contain the resulting data:
Filter myfile = new StdFile("out.bin",
StdFile.OpenMode.e_write_mode);
// attach the output of ascii85 to myfile
Filter ascii85 = new ASCII85Encode(myfile);
// attach the output of flate to ascii85
Filter flate = new ASCII85Encode(ascii85);
FilterWriter writer = new FilterWriter(flate);
writer.WriteString("Hello World");
writer.Flush();
PDFNet provides full support for all common Filters used in PDF. Although included Filters should cover all common use case scenarios, advanced users may want to provide custom implementations for certain filters (e.g. custom color conversion, or a new compression method). PDFNet provides an open and expandable architecture for creation of custom filters. To implement a custom Filter, derive a new class from Filter base class and implement the required interface. A more detailed guide for implementing custom Filters is available through PDFTron Systems developer program. Please contact support @pdftron.com for more details.
PDF documents can be secured and encrypted using various encryption schemes. PDFNet provides support for standard security handler and provides an extension mechanism through which users can register custom security handlers.
The code that performs user authorization and sets permissions is known as a security handler. The core API has one built-in security handler known as Standard Security Handler (StdSecurityHandler). The Standard Security Handler supports two passwords:
Applications can also implement their own implementations of SecurityHandler.
For example, a custom implementation of a SecurityHandler may perform
user authorization that requires the
presence of a hardware dongle, or a hardware key, file, etc.
A Security Handler is used when:
A document may have zero, one, or two security handlers associated with it. A document has zero security handlers if the file is not secured. When security is applied to a file, or the user selects a different security handler for a secured file, the newly-chosen security handler is not put in place immediately. Instead this new security handler is a pending security handler until the document is saved.
A document may have both a current and a new security handler associated with it because PDF document is not fully loaded in memory and decrypted when it is loaded so the original security handler is still required to decrypt the content.
To secure a document, create a new SecurityHandler, set permissions, and authentication data, and set it is as new handler using doc.SetSecurityHandler(handler). For example,
// C# Sample
StdSecurityHandler new_handler = new StdSecurityHandler();
// Set a user password required to open a document
byte[] user_password = new byte [4];
user_password[0] = (byte)'t';
user_password[1] = (byte)'e';
user_password[2] = (byte)'s';
user_password[3] = (byte)'t';
new_handler.ChangeUserPassword(user_password);
// Set Permissions
new_handler.SetPermission(
SecurityHandler.Permission.e_print, true);
new_handler.SetPermission(
SecurityHandler.Permission.e_extract_content, false);
// Associate the new_handler with the document.
Doc doc = new Doc("in.pdf");
doc.SetSecurityHandler(new_handler);
Working with Secured/Encrypted Documents
PDFNet fully supports reading of encrypted PDF documents. You can check whether a document is encrypted using doc.IsEncrypted() method. If document is encrypted you should initialize security handler using doc.InitializeSecurityHandler() method.
// Open a potentially encrypted document
Doc doc = new Doc("in.pdf");
doc.InitializeSecurityHandler()
Because InitializeSecurityHandler() doesn't have any side effects on documents that are not encrypted you can always invoke this method after constructing a document.
If a document doesn't require a authentication data (e.g. a user password) in order to view the content InitializeSecurityHandler() is enough to work with encrypted documents. On the other hand if a document requires a user password or other authorization data in order to open a document and view the content you need to implement the user interface methods that will perform authentication (e.g. a method that will collect user password through a dialog box). The default security handler does not collect authorization data and will throw an exception if a document requires a user password. In order to define a custom UI that implements application specific authorization procedure derive a class from StdSecurityHandler and implement UI callback methods as in the following example:
// C++ sample
class MySecurityHandler : public StdSecurityHandler
{
public:
MySecurityHandler (int key_len, int enc_code)
: StdSecurityHandler(key_len, enc_code) {}
MySecurityHandler (const MySecurityHandler& s)
: StdSecurityHandler(s) {}
// In this callback ask authorization data.
// This may involve a GUI dialog used to collect
// the password.
bool GetAuthorizationData (Permission p)
{
string pass;
// collect the password from standard input
cin >> pass;
InitPassword(pass);
return true;
}
// This callback could be used to customize
// security handler preferences.
bool EditSecurityData(SDF::Doc& doc) {
return false;
}
// This callback is invoked when authorization
// process fails.
void AuthorizeFailed()
{
// Display the error message.
cout << "Authorize failed...." << endl;
}
SecurityHandler* Clone() const {
return new MySecurityHandler(*this);
}
static SecurityHandler* Create (const char* name,
int key_len, int enc_code) {
return new MySecurityHandler (key_len, enc_code);
}
};
In this sample GetAuthorizationData() callback was used to collect the user password from the standard input followed by a call to InitPassword(). AuthorizeFailed() callback is called if the password supplied in InitPassword() is invalid.
In order for MySecurityHandler to take effect when opening secured documents you need to register the new handler with SecurityManager using RegisterSecurityHandler(...) static method:
SecurityManagerSingleton::Instance()
.RegisterSecurityHandler("Standard",
SDF::SecurityDescriptor("Standard Security",
MySecurityHandler::Create));
Security handler registration is usually done once upon program startup. Security handlers can also be registered and removed dynamically at any point during program lifetime. A more complete example used to register and initialize a security handler is given blow:
SecurityManager& sec_mgr =
SecurityManagerSingleton::Instance();
sec_mgr.RegisterSecurityHandler("Standard",
SDF::SecurityDescriptor("Standard Security",
MySecurityHandler::Create));
// Open a secured document
Doc doc("file_in.pdf");
if (!doc.InitializeSecurityHandler())
{
cout << "Document authentication error....";
return;
}
The first step was to get a reference to the SecurityManager. SecurityManager is a global object (singleton) that keeps track of all registered SecurityHandlers. By default only StdSecurityHandler with no UI interaction is registered. The second step was to register a standard security handler called MySecurityHandler that provides UI authorization functions. The first argument to RegisterSecurityHandler() was the name of the security handler as it appears in document Encrypt dictionary ("Standard") and the second parameter is SecurityDescriptor. SecurityDescriptor accepts handler’s descriptive name that may be used in a UI interface ("Standard Security"), and a pointer to a factory method that will be used to instantiate the security handler when it is required. MySecurityHandler's callback functions will be invoked during a call to doc.InitializeSecurityHandler(). InitializeSecurityHandler will attempt to collect the authorization data by calling GetAuthorizationData() on MySecurityHandler. If the correct authorization information is not obtained after several attempts InitializeSecurityHandler() will call MySecurityHandler's AuthorizeFailed callback.
After SecurityHandler is initialized you can access the security handler associated with the document using GetSecurityHandler() method. You can edit permissions, and authorization data on existing handler or set a completely new security handler using doc.SetSecurityHandler(handler) method.
To remove PDF security set the current SecurityHandler to null:
// C#
PDFDoc doc = new PDFDoc("encrypted.pdf");
doc.InitializeSecurityHandler();
doc.SetSecurityHandler(null);
// C++
PDFDoc doc("encrypted.pdf");
doc.InitializeSecurityHandler();
doc.SetSecurityHandler(AutoPtr<SecurityHandler>(0));
Besides providing a full support for standard PDF security, PDFNet allows users to work with custom security handlers and proprietary encryption algorithms. To define a custom security handler derive a class from SecurityHandler and implement SecurityHandler's interface. The registration and use of custom security handler is identical to the procedure outlined for Standard Security handler in the previous section. Please contact support @pdftron.com for more details.
High-level PDF constructs such as pages, interactive forms, bookmarks, graphical elements on the page are implemented in namespace called PDF. PDF classes contain methods that can be used to copy pages between documents, to read/write graphical Elements such as images, paths, and text, to manipulate interactive forms etc. Although PDF implements most commonly used PDF functionality you can at any point access underlying SDF objects and have full control of the low-level object model.
A PDF document can be created from scratch using a default constructor:
PDFDoc new_doc = new PDFDoc();
The new document does not contain any pages. See Working with Pages section for details on how to create new and how to work with existing pages.
Using PDFNet you can open a document from a serialized file, from a memory buffer, and from a Filter stream.
To open an existing PDF file specify the file-path in PDFDoc constructor:
PDFDoc mydoc = new PDFDoc("in.pdf");
You can also open an existing PDF document from a memory buffer:
FileStream stm = new FileStream("in.pdf",
FileMode.Open, FileAccess.Read);
BinaryReader reader = new BinaryReader(stm);
byte[] buffer = reader.ReadBytes(
(int) reader.BaseStream.Length);
reader.Close();
PDFDoc mydoc = new PDFDoc(buffer);
You can also provide a MemoryFilter or a custom Filter such as HTTPFilter in order to provide alternative ways to access existing PDF data.
If the existing document is encrypted (i.e. doc.IsEncrypted()) returns true you need to call doc.InitSecurityHandler() after constructing the document. In practice you may always call doc.InitSecurityHandler() since the method does not have any side effect on documents that are not secured.
PDFDoc doc = new PDFDoc("in.pdf");
if (!doc.InitializeSecurityHandler())
{
Console.WriteLine("Document authentication error...");
return;
}
PDFNet security API is explained in details in Security and PDF Security sections.
PDF document can be serialized (or saved) to a file on a disk, to a memory buffer, or to an arbitrary data stream such as MemoryFilter or HTTPFilter.
To save a file on a disk use PDFDoc::Save(...) method, i.e.
The second argument represents a bitwise set flags that are used as options during serialization.
PDFNet allows document to be saved incrementally (see section 2.2.7 "Incremental Update" in PDF Reference Manual). Because applications may allow users to modify PDF documents users should not have to wait for the entire file (which can contain hundreds of pages) to be rewritten each time modifications to the document are saved. PDFNet allows modifications to be appended to a file, leaving the original data intact. The addendum appended when a file is incrementally updated contains only those objects that were actually added or modified. Incremental update allows an application to save modifications to a PDF document in an amount of time proportional to the size of the modification rather than the size of the file. In addition, because the original contents of the document are still present in the file, it is possible to undo saved changes by deleting one or more file updates.
Changes can be appended to an existing document using e_incremental flag:
Note that the file output name matches the input name.
Some PDF files over time accumulate objects that are not used (e.g. old updates, modifications, unused fonts, images, etc). To trim down the file size use e_remove_unused flag:
In order to provide user feedback PDFDoc::Save(...) method accepts optional object derived from ProgressMonitor base-class. ProgressMonitor provides a callback interface that keeps the client application up to date about the function progress.
A PDF document can also be serialized in a memory buffer as follows:
byte[] buf = null; int buf_sz = 0; doc.Save(ref buf, ref buf_sz, 0);
Document's page sequence
A high-level PDF document (PDF::PDFDoc) contains a sequence of PDF::Pages as illustrated in the following figure:
PDF::PDFDoc::PageBegin() returns a PageIterator to the first Page in the document, whereas PDF::PDFDoc::PageEnd() returns a PageIterator to null or non-existent page. If doc.PageBegin() iterator equals doc.PageEnd() the document has no pages. You can also determine the number of pages in the document using PDFDoc::GetPageNumber() method. The following code snippet shows how to print out the media box coordinates (i.e. page size) for every page in document page sequence:
PageIterator i=doc.PageBegin();
PageIterator end=doc.PageEnd();
for (; i!=end; i.Next())
{
Rect mediabox = new Rect(itr.Current().GetMediaBox());
Console.WriteLine("Media box: {0}, {1}, {2}, {3}",
mediabox.x1, mediabox.y1,
mediabox.x2, mediabox.y2);
}
In this code we used itr.Next() in order to move to the next page in the sequence (in a similar fashion you can use itr.Prev() in order to move to the previous page) and itr.Current() in order to access the Page object referenced by the iterator.
Another way to achieve the same result as in previous code sample is using GetPageNumber() and PageFind(page_num) methods:
int page_num = doc.GetPageNumber();
for (int i=1; i<=page_num; ++i)
{
PageIterator itr = doc.PageFind(i);
Page page = itr.Current();
Rect mediabox = new Rect(page.GetMediaBox());
Console.WriteLine("Media box: {0}, {1}, {2}, {3}",
mediabox.x1, mediabox.y1,
mediabox.x2, mediabox.y2);
}
Note that because pages in the document sequence are indexed starting from 1 , another way to access the first page in the document is using doc.Find(1). If the given page number can not be found in the document's page sequence Find(page_num) returns a PageIterator to null or non-existent page. Therefore:
PageIterator itr = doc.PageFind(page_num);
if (itr!=doc.PageEnd())
{
Console.WriteLine("PageFind returned an iterator
to an existing page");
}
else
{
Console.WriteLine(
"Document does not contain page#: {0}", page_num);
}
In order to create a new page use PDFDoc::PageCreate(media_box) method. The function has an optional Rect argument that can be used to specify page size or more specifically its "media box".
A media box is a rectangle, expressed in default user space units, defining the boundaries of the physical medium on which the page is intended to be displayed or printed. A user space units is 1/72 of an inch. If media_box is not specified the default dimensions of the page are 8.5 x 11 inches (or 8.5*72, 11*72 units).
Page x = doc.PageCreate(); doc.PagePushBack(x);
In this code snippet we created a new 8.5x11 page and have added it at the end of document's page sequence.
Note that after the page is created it does not belong to document's page sequence and needs to be placed at a specific location within the sequence in order to be 'visible'. This is illustrated in the above figure where page 'x' is shown to be outside of document's page sequence. PagePushBack() moves page 'x' at a specific location in the page sequence.
A Page can be copied from one document to another (or replicated within an existing document) using PDFDoc.PageInsert(where, pg), PDFDoc.PagePushFront(pg), PDFDoc.PagePushBack(pg) and PDFDoc.ImportPages(list) methods.
PagePushBack(page) appends the given page at the end of page sequence, whereas PagePushFront(page) inserts the page at the front of the sequence. PageInsert(where, page) inserts the page in front the page pointed by 'where' iterator.
// Append three copies of the page to the document. doc.PagePushBack(x); doc.PagePushBack(x); doc.PagePushFront(x); // Create a new page and insert it just before // the second page doc.PageInsert(doc.PageFind(2), doc.PageCreate());
Note that it is possible to replicate a given page within a document by repeatedly adding the same page.
The same methods can also be used to merge documents or copy pages from one document to another.
In PDF every page object references various resource objects such as images, fonts, and color spaces that are used to render the page. In order to accurately copy a page from one document to another PageInsert / PagePushFront / PagePushBack methods copy all referenced resources.
If you are copying several pages between two documents it is better to use PDFDoc.ImportPages(page_list) because the resulting document will be much smaller and the copy operation will be faster.
ImportPages() is better than other methods for multi page copy because it preserves resource sharing in the target document. This is illustrated in following figures.

Figure. Copying pages between two documents using PageInsert/PagePushFront/PagePushBack
In PDF document page resources (e.g. fonts, images, color-spaces, forms, etc) can be shared across several pages in order to reduce file-size and speed up page processing. This is shown in 'Document 1' in the above figure where all three pages share the same font and color space object. 'Document 2' was created by direct page copy using PageInsert, PagePushFront or PagePushBack methods. Note that every page now refers to a separate instance of resource object.
On the other hand the result of page copy using ImportPages() is identical to the original document. Note that in 'Document 2' resource objects are shared across pages.

Figure. Copying pages between two documents
using ImportPages()
Also note that if pages are copied/replicated within the same document (not between two different documents) all methods behave the same and resources are always shared.
A code that copies all pages from one document to another may look like this:
PDFDoc in_doc = new PDFDoc("in.pdf");
PDFDoc new_doc = new PDFDoc();
PageIterator i = in_doc.PageBegin();
PageIterator end = in_doc.PageEnd();
for (; i!=end; i.Next())
{
new_doc.PagePushBack(i.Current());
}
but as we explained above it is better to keep sharing resources in 'new_doc' by importing all pages first (using ImportPages() method):
PDFDoc in_doc = new PDFDoc("in.pdf");
PDFDoc new_doc = new PDFDoc();
// Create a list of pages to copy.
ArrayList copy_pages = new ArrayList();
PageIterator itr = in_doc.PageBegin();
PageIterator end = in_doc.PageEnd();
for (; itr!=end; itr.Next())
{
copy_pages.Add(itr.Current());
}
// Import all the pages in 'copy_pages' list
ArrayList imported_pages = new_doc.ImportPages(copy_pages);
// Note that pages in 'imported_pages' list are not
// placed in documen't page sequence. This is done
// in the following step.
for (int i=0; i!=imported_pages.Count; ++i)
{
new_doc.PagePushBack((Page)imported_pages[i]);
}
ImportPages(page_list) creates a copy of pages given in the argument list preserving shared resources. Note that the pages in the returned list are ordered in the same way as pages in the argument list and that although pages are copied they are not inserted into document's page sequence. Therefore in order to be visible imported/copied pages should be appended or inserted at a specific location within document's page sequence.
A page can be deleted using PDFDoc::PageRemove(itr) method where itr is the PageIterator to the page that should be deleted. A PageIterator for the given page can be obtained using PDFDoc::Find(page_num) or using direct iteration through document's page sequence. This is shown in the examples below:
// Remove the fifth page from the page sequence. doc.PageRemove(doc.PageFind(5)); // Remove the third page. PageIterator i = doc.PageBegin(); i.Next(); i.Next(); doc.PageRemove(i);
PDFDoc::PageRemove(itr) only removes the page from document's page sequence. The page and its resources are still available until the document is saved in 'full save mode' with 'remove unused objects' flag or until next garbage collect operation. If you are saving the file in 'incremental mode' the serialized document may contain the content of the removed page.
Given the copy and delete page operations described in previous sections it is easy to re-arrange and sort pages. For example, the order of pages in the document can be reversed as follows.
int page_num = doc.GetPageNumber();
for (int i=1; i<=page_num; ++i)
{
PageIterator itr = doc.PageFind(i);
Page page = itr.Current();
doc.PageRemove(itr);
doc.PagePushFront(page);
}
A page can be rotated when displayed or printed by specifying 'Rotate' attribute in page dictionary. The value of 'Rotate' attribute is the number of degrees by which the page should be rotated clockwise when displayed or printed. The value must be a multiple of 90.
// Rotate the first page 90 degrees clockwise.
PageIterator itr = doc.PageBegin();
Page page = itr.Current();
Obj page_dict = page.GetSDFObj();
page_dict.Put("Rotate", Obj.CreateNumber(90));
To find out the rotation for an existing page use the following code:
int deg = 0; Obj rotate = page.GetRotation(); if (rotate != null) deg = (int) rotate.GetNumber();
The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. The default value is the page’s media box. A new crop box can be specified as follows.
Obj page_dict = page.GetSDFObj();
page_dict.Put("CropBox",
Rect.CreateSDFRect(0, 0, 500, 600));
and to find out what is the crop box on an existing page:
Rect rect = null;
Obj cb = page.GetCropBox();
if (cb != null)
{
rect = new Rect(cb);
// Crop box is:
// rect.x1, rect.y1,
// rect.x2, rect.y2
}
else
{
// The crop box is equal to the media box
}
The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary can safely be discarded without affecting the meaning of the PDF file. A new value for page media box can be specified as follows:
Obj page_dict = page.GetSDFObj();
page_dict.Put("MediaBox",
Rect.CreateSDFRect(0, 0, 400, 700));
or by editing the existing media box:
Rect media_box = new Rect(page.GetMediaBox()); media_box.x1 = 0; media_box.y1 = 0; media_box.x2 = 400; media_box.y3 = 700;
Page content can be horizontally and vertically translated by adjusting the media box. For example, the following code will translate all page contents 2 inches= 72 units per inch * 2 inches to the left.
Rect media_box = new Rect(page.GetMediaBox()); // translate the page 2 inches horizontally media_box.x1 += 144; media_box.x2 += 144; media_box.Update();
PDFNet provides an advanced and easy to use API that can be used to read, write and edit text, images, and other graphical Elements. Because the API is very efficient, PDFNet is an good match for interactive (e.g. PDF viewers and editors) and content extraction applications (e.g. conversion, preflight, etc), as well as for dynamic PDF generation.
Page content is a major component of a PDF document. It represents the visible marks on a page that are drawn by a set of PDF marking operators. For details on PDF content streams and detailed operator descriptions please refer to Section 3.7.1, “Content Streams,” in the PDF Reference Manual.
Although PDFNet SDF and Filters API provide everything that is required to decode and parse low-level content streams using Element API is much easier and more intuitive. In short PDFNet Element API allows you to treat a page’s contents as a list of objects (i.e. a display list or a sequence of Elements) rather than manipulating sets of cryptic marking operators.
A set of marking operators from the page content stream builds an Element (such as text, path, or an image) and the set of Elements represents a display list.

Figure. A sequence of page marking operators
represents an Element.
Therefore PDFNet Element interface allows user to treat page contents as a list of objects whose values and attributes can be modified.
Using Element interface applications can read, write, edit, and
create
page contents and page resources, which may contain fonts, images,
shadings, patterns, extended graphics states, and so on.
Your application may use Element methods to modify the appearance of a page or it can create pages from scratch.
Each Element is independent of each other. Therefore every Element encapsulates all the relevant information about itself. A text object contains all font attributes, for instance.
Element is the concrete base class for all Elements. PDFNet supports all content elements occurring in PDF, namely : path, text_begin, text, text_new_line, text_end, image, inline_image, shading, form, group_begin, group_end, marked_content_begin, and marked_content_end.
Note that some Elements such as path, text, image, inline-image, and shading represent concrete graphical elements, whereas Elements such as text_begin/end, text_new_line, group_begin/end, and marked_content_begin/end don't have graphical representation but are used for logical grouping of Element sequences or to provide meta-data associated with Element groups.
Element hierarchy implements a composite pattern that is Element class implements methods for all derived classes.

Figure. Element hierarchy. Only methods listed in the Element
group or base class can be invoked for the given type.
To find the type of a given Element use element.GetType() method. It is illegal to call methods that are not related to given element type and the behavior is undefined. For example, it is illegal to call element.GetImageData() on a e_path element.
Note that in the above figure e_group_begin/end and e_text_begin/end don't add any functionality to the common Element interface (i.e. GetType()/GetGState()/GetCTM()). The main purpose of these Elements is to mark sequences of Elements into logical groups. Element e_group_begin corresponds to PDF 'q' operator (saveState), e_group_end corresponds to 'Q' operator, e_text_begin corresponds to 'BT' (begin text) operator and e_text_end corresponds to 'ET' operator.
e_text_begin initializes a text object, initializing the text matrix and the text line matrix to the identity matrix. Because PDF text objects cannot be nested a second e_text_begin element cannot appear before e_text_end. A text object contains one or more text runs (i.e. e_text elements) and new line markers (e_text_new_line elements). e_text and e_text_new_line are not allowed outside of the text group (i.e. outside element sequence surrounded by e_text_begin/end).
Every Element has associated CTM (current transformation matrix) and graphics state. Element.GetCTM() returns the transformation matrix in effect while processing the current Element. Element.GetGState() returns associated graphics state. GState is a keeps track of a number of style attributes used to visually define graphical Elements. The methods available through GState class are listed below:

Figure. Graphics State.
For a detailed description of graphics state attributes refer to section 4.3 "Graphics State" in PDF Reference Manual.
Page content is represented as a sequence of graphical Elements such as paths, text, images, forms, etc. The only effect of the ordering of Elements in the display list is paint order. Elements that occur later in the display list can obscure earlier elements.
A display list can be traversed using ElementReader as in the following example:
void ReadDoc()
{
// Open an existing document
PDFDoc doc = new PDFDoc("in.pdf");
ElementReader reader = new ElementReader();
// Read page content on every page in the document
PageIterator itr;
PageIterator end = doc.PageEnd();
for (itr=doc.PageBegin(); itr!=end; itr.Next())
{
// Read the page
reader.Begin(itr.Current());
ProcessElements(reader);
}
}
void ProcessElements(ElementReader reader)
{
Element element;
// Traverse the page display list
while ((element = reader.Next()) != null)
{
switch (element.GetType())
{
case Element.ElementType.e_path:
{
if (element.IsClippingPath())
{}
// ...
break;
}
case Element.ElementType.e_text:
{
Matrix2D text_mtx = element.GetTextMatrix();
// ...
break;
}
case Element.ElementType.e_form:
{
reader.FormBegin();
ProcessElements(reader);
reader.End();
break;
}
}
}
}
In order to begin display list traversal call reader.Begin(). reader.Next() will than return subsequent Elements until NULL/null is returned marking the end of the display list.
Note that ElementReader works with one page at a time although the same ElementReader may be reused to process multiple pages .
Processing Forms, Type3 glyphs, tiling patterns.
Note that PDF page display list may contain children display lists of Form XObjects, Type3 font glyphs, and tiling patterns. A form XObject is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images), defined as a PDF content stream. It may be painted multiple times—either on several pages or at several locations on the same page—and will produce the same results each time, subject only to the graphics state at the time it is invoked. In order to open a child display list for Form XObject call reader.FormBegin() method and to return processing to the parent display list call reader.End(). Processing of the form XObject display is illustrated in the following figure:

Figure. Traversing the child display list.
Note that in the above example a child display list is opened when element with type Element.ElementType.e_form is encountered using reader.FormBegin() method. The child display list becomes the current display list until it is closed using reader.End(). At this point the processing is returned to the parent display list and the next Element returned will be the Element following the Form XObject. Also note that sub-display lists may also have children display lists because the Form XObjects may be nested. In the above example support for nesting is implemented using recursion.
Analogous to Form XObject pattern display list can be opened using reader.PatternBegin() whereas Type3 glyph display list can be opened using reader.Type3FontBegin() method.
Processing changes in Graphics State
After reading an Element using ElementReader.Next() method it is possible to access all graphical attributes of the Element through its graphics state. Some applications are more interested in changes in the graphics state than attribute values. For example, a transition from one Element to another may not involve changes in the graphics state or there may be changes only to couple of attributes. In these cases it is not efficient to make memeberwise comparisons between the old and the current graphics state.
PDFNet offers an efficient and easy to use API that can be used to enumerate the list of changes between subsequent Elements.
The list of changes in graphics state can be traversed using ElementReader.ChangesBegin/End() method as in the following example:
GSChangesIterator itr = reader.ChangesBegin();
GSChangesIterator end = reader.ChangesEnd();
for (; itr != end; itr.Next())
{
switch(itr.Current())
{
case GState.GStateAttribute.e_transform:
// Get transform matrix for this element.
// Unlike path.GetCTM() that returns full
// transformation matrix gs.GetTransform()
// returns only the transformation matrix
// that was installed for this element (a
// cm operator preceding this Element).
// gs.GetTransform();
break;
case GState.GStateAttribute.e_line_width:
// gs.GetLineWidth();
break;
case GState.GStateAttribute.e_line_cap:
// gs.GetLineCap();
break;
case GState.GStateAttribute.e_line_join:
// gs.GetLineJoin();
break;
case GState.GStateAttribute.e_miter_limit:
// gs.GetMiterLimit();
break;
case GState.GStateAttribute.e_dash_pattern:
break;
// Etc.
}
}
}
It is also possible to query ElementReader for changes in a given
attribute:
if (reader.IsChanged(
GState.GStateAttribute.e_line_width))
{
// line width was changed.
}
Note that the list of modified attributes is accumulated when calling ElementReader.Next(). To clear the list of modified attributes use ElementReader.ClearChangeList() method. A call to ClearChangeList() serves as a marker in the display list from which further changes in the graphics state are tracked.
New page content can be added to an existing page or a blank new page using ElementBuilder and ElementWriter. ElementBuilder is used to instantiate Element(s) that can be written to one or more pages using ElementWriter:

Figure. Adding new content to a page.
The following example illustrates how to write page content to a new document.
PDFDoc doc = new PDFDoc();
// ElementBuilder is used to build new Element objects
ElementBuilder f = new ElementBuilder();
// ElementWriter is used to write Elements to the page
ElementWriter writer = new ElementWriter();
// Start a new page
// Position an image stream on several places on the page
Page page = doc.PageCreate();
// Begin writing to this page
writer.Begin(page);
// Attach ElementBuilder to the page
f.Begin(page);
// Import an Image that can be reused multiple
// times in the document or multiple times on the
// same page.
StdFile img_file = new StdFile("peppers.jpg",
StdFile.OpenMode.e_read_mode);
FilterReader img_data = new FilterReader(img_file);
Image img = Image.Create(doc.GetSDFDoc(),
img_data,
Image.ImageCompression.e_jpeg,
400, 600, 8,
ColorSpace.CreateDeviceRGB());
Element element = f.CreateImage(img,
new Matrix2D(200, -145, 20, 300, 200, 150));
writer.WritePlacedElement(element);
GState gstate = element.GetGState();
// Use the same image (just change its matrix)
gstate.SetTransform(200, 0, 0, 300, 50, 450);
writer.WritePlacedElement(element);
// Use the same image (just change its matrix)
writer.WritePlacedElement(
f.CreateImage(img, 300, 600, 200, -150));
// save changes to the current page
writer.End();
// Add a new page to the document sequence
doc.PagePushBack(page);
// Start a new page
page = doc.PageCreate();
writer.Begin(page);
f.Begin(page);
// Construct and draw a path object using
// different GState attributes
f.PathBegin();
f.MoveTo(306, 396);
f.CurveTo(681, 771, 399.75, 864.75, 306, 771);
f.CurveTo(212.25, 864.75, -69, 771, 306, 396);
f.ClosePath();
// path is now constructed
element = f.PathEnd();
element.SetPathFill(true);
// Set the path color space and color
gstate = element.GetGState();
gstate.SetFillColorSpace(
ColorSpace.CreateDeviceCMYK());
gstate.SetFillColor(
new ColorPt(1, 0, 0, 0)); // cyan
gstate.SetTransform(
0.5, 0, 0, 0.5, -20, 300);
writer.WritePlacedElement(element);
// Draw the same path using a different
// stroke color.
// This path is should be filled and stroked
element.SetPathStroke(true);
gstate.SetFillColor(
new ColorPt(0, 0, 1, 0)); // yellow
gstate.SetStrokeColorSpace(
ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(new ColorPt(1, 0, 0)); // red
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 300);
gstate.SetLineWidth(20);
writer.WritePlacedElement(element);
// Draw the same path with with a given dash pattern
// This path is should be only stroked
element.SetPathFill(false);
gstate.SetStrokeColor(new ColorPt(0, 0, 1)); // blue
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 0);
double[] dash_pattern = {30};
gstate.SetDashPattern(ref dash_pattern, 0);
writer.WritePlacedElement(element);
writer.End(); // save changes to the current page
doc.PagePushBack(page);
doc.Save("out.pdf", Doc.SaveOptions.e_remove_unused);
Note that once the Element is instantiated using ElementBuilder you have full control over its properties and its graphics state.
Page content can also come from existing pages. For example, you can use ElementReader to read paths, text, and images from existing pages and copy them to the current page. Note that along the way you can fully modify Element properties and its graphics state. This is a basis of page content editing that will be discussed in the next section. The following example copies all Elements except images from an existing page and changes text color to blue:
ElementWriter writer = new ElementWriter();
ElementReader reader = new ElementReader();
Element element;
reader.Begin(doc.PageBegin().Current());
Page new_page = doc.PageCreate(new Rect(0, 0, 612, 794));
doc.PagePushBack(new_page);
writer.Begin(new_page);
while ((element = reader.Next()) != null)
{
if (element.GetType() == Element.ElementType.e_text)
{
// Set all text to blue color.
GState gs = element.GetGState();
gs.SetFillColorSpace(
ColorSpace.CreateDeviceRGB());
gs.SetFillColor(new ColorPt(0, 0, 1));
}
else if (element.GetType()
== Element.ElementType.e_image)
{
// remove all images
continue;
}
writer.WriteElement(element);
}
writer.End();
reader.End();
A PDF document may optionally display a document outline on the screen, allowing the user to navigate interactively from one part of the document to another. The outline consists of a tree-structured hierarchy of Bookmarks (sometimes called outline items), which serve as a 'visual table of contents' to display the document’s structure to the user.
Each Bookmark has a title that appears on screen, and an Action that specifies what happens when a user clicks on the Bookmark. The typical Action for a user-created Bookmark is to move to another location in the current document, although any Action can be specified.
Although it is possible to work with outline items using SDF/Cos API (See section 8.2.2 'Document Outline' in PDF Reference Manual for more details), this work is simplified using PDFNet which provides a high-level utility class PDF::Bookmark.
You can use Bookmark.GetNext(), Bookmark.GetPrev(), Bookmark.GetFirstChild () and Bookmark.GetLastChild () in order to navigate the whole outline tree.
This is shown in the following code snippet.
// C# Sample:
// Prints out the outline tree to the standard output
void PrintIdent(Bookmark item)
{
int ident = item.GetIdent() - 1;
for (int i=0; i < ident; ++i)
Console.Write(" ");
}
void PrintOutlineTree(Bookmark item)
{
for (; item.IsValid(); item=item.GetNext())
{
PrintIdent(item);
Console.WriteLine("{0:s}{1:s}",
(item.IsOpen() ? "- " : "+ "), item.GetTitle());
if (item.HasChildren())
{
// Recursively print children sub-trees
PrintOutlineTree(item.GetFirstChild());
}
}
}
static void Main(string[] args)
{
PDFDoc doc = new PDFDoc("../../../Data/out1.pdf");
doc.InitializeSecurityHandler();
Bookmark root = doc.GetFirstBookmark();
PrintOutlineTree(root);
}
Note that the root Bookmark was obtained
using PDFDoc.GetFirstBookmark(). If the GetFirstBookmark() returns
a Bookmark that is not valid (i.e. GetFirstBookmark().IsValid()
return false) the document has no outline tree.
A new outline three can be created as follows:
PDFDoc doc("../Data/in.pdf");
doc.InitializeSecurityHandler();
Bookmark myitem = Bookmark::Create(doc, "My Item");
doc.AddRootBookmark(myitem);
Sub-items can be added using Bookmark.AddChild(…) method:
Bookmark sub_item = myitem.AddChild("My Sub-Item");
myitem.AddChild("My Sub-Item 2");
Note that a Bookmark can be associated with different kinds of Actions. The most common action is to move to another location in the current document. This type of Actions is called Destination Action (See section 8.2.1 'Destinations' in PDF Reference Manual for more details). The following code creates a new page Destination and sets the Bookmark’s action:
// The following example creates an 'explicit' destination Destination dest = Destination::CreateFit(*doc.PageBegin()); Action action= Action::Create(dest); myitem.SetAction(action);
Using PDFNet it is also possible to quickly create ‘named’ destinations (see section 8.2.1 'Destinations' in PDF Reference for more details). Named destinations have an advantage over explicit destinations because they allow the location of the destination to change without invalidating existing link(s).
To create a named destination pass in the key under which the destination will be stored in Action::Create(…) method:
Action blue_action = Action::Create("blue1",
Destination::CreateFit(*doc.PageBegin() );
Bookmarks class also allows you to quick find and Bookmarks based on the title text. For example, the following code snippet looks for a Bookmark called “foo” and then removes it from the outline tree:
Bookmark foo = doc.GetFirstBookmark().Find("foo");
if (foo.IsValid())
{
foo.Delete();
}
Bookmark API allows you to set and change any property on outline items including title text, action, color, and formatting. Color and other formatting can help readers get around more easily in large PDF documents. The following code adjusts color and formatting properties on three Bookmark items:
red.SetColor(1, 0, 0); green.SetColor(0, 1, 0); // use bold font for green title text green.SetFlags(2); blue.SetColor(0, 0, 1); // use bold and italic font for blue title text blue.SetFlags(3);
An interactive form (sometimes referred to as an AcroForm) is a collection of fields such as text boxes, checkboxes, radio buttons, drop-down lists, pushbuttons, etc. for gathering information interactively from the user. A PDF document may contain any number of Fields appearing on any combination of pages, all of which make up a single, global interactive form spanning the entire document. PDF forms are similar to HTML forms but there are some important differences:
PDFNet fully supports reading, writing, and editing PDF forms and provides many utility methods so that work with forms is simple and efficient. Using PDFNet forms API arbitrary subsets of form fields can be imported or exported from the document, new forms can be created from scratch, and the appearance of existing forms can be modified.
The form shown in the following figure below consists of a number of Fileds:

Every field has its name and value, as well as its annotation appearance.
In PDFNet Fields are accessed through FieldIterator-s.
For example, the list of all Fields present in the document can be traversed using the following code snippet:
FieldIterator itr = doc.InteractiveFieldBegin();
FieldIterator end = doc.InteractiveFieldEnd();
for(; itr != end; itr.Next())
{
Field field = itr.Current();
Console.WriteLine("Field name: {0}",field.GetName());
}
You can also search for a given filed by name:
// Search for a specific field
FieldIterator itr = doc.InteractiveFieldFind("name");
if (itr != doc.InteractiveFieldEnd())
{
Field field = itr.Current();
Console.WriteLine("Field {0} was found.",
field.GetName());
}
else {
Console.WriteLine("Field was not found.");
}
If a given filed name was not found or if the end of the field list was reached the iterator will be equal to doc.InteractiveFieldEnd().
If you have a valid iterator you can access the Filed using Current() method (or dereference operator in C/C++);
Field field = itr.Current(); // C# Field field = *itr; // C++
PDF offers seven different field types. Each type of form field is used for a different purpose, and they have different properties, appearances, options, and actions that can be associated with the fields. In this section, we will explain how to create all the seven field types and some attributes specific to each one.
Common field types are text-box, checkbox, radio-button, combo-box, and push-button. To find out the type of the Field use Field.GetType() method:
Field.FieldType type = field.GetType();
switch(type)
{
case Field.FieldType.e_button:
Console.WriteLine("Button");
break;
case Field.FieldType.e_text:
Console.WriteLine("Text");
break;
case Field.FieldType.e_choice:
Console.WriteLine("Choice");
break;
case Field.FieldType.e_signature:
Console.WriteLine("Signature");
break;
}
Regardless of which field type you create you need to provide a Filed name:
Field myfiled = doc.InteractiveFieldCreate("address",
Field.FieldType.e_text);
Under most circumstances, all field names must be unique. If you have a field you name as "address" and you create a second field you likewise call "address", you cannot supply different data in the two fields.
Field names can use alpha characters, numbers, or both to identify the field. All field names are case-sensitive. For example, you can use names such as empFirstName, empSecondName, empNumber, and so on for a group of fileds that are related to the same concept (in our sample employee entity).
Another means of naming fields is to use a parent and child name. For example, you can name above fileds as follows: employee.name.first, employee.name.second, employee.number. The period (i.e. a decimal point) separates the parent from the child.
This naming convention is not only useful for organizing purposes but is better suitable on automatic operations on Fileds.
In PDFNet Field.GetName() returns a string representing the fully qualified name of the field (e.g. "employee.name.first"). To get the child name (i.e. "first") use Field.GetPartialName() method.
After the new Field is created you can add it to a given Page(s) using Page.CreateWidget(Filed, Rect) metod:
page.CreateWidget(name, new Rect(50, 550, 350, 600));
The second argument represents the rectangle where the widget annotation should be placed on the page.
Form Fields can be populated using Field.SetValue() method:
field.SetValue(Obj.CreateString("New Value"));
// Regenerate appearance stream.
field.RefreshAppearance();
Note that after modifying Field's value we refreshed its appearance stream. In PDF Filed's value and appearance are two different entities so if you don't call RefreshAppearance() the initial value on PDF page will be unchanged (e.g. it may have the old value or will be blank).
Another approach used in low-end PDF libraries is to let the PDF viewer automatically pre-generate appearance streams by setting 'NeedAppearances' flag in AcroForm dictionary:
doc.GetAcroForm()->Put("NeedAppearances",
new Bool(true));
This will force viewer application to auto-generate appearance streams every time the document is opened. This method is not reliable because Acrobat does not always generate appearance streams correctly. Another disadvantage it that the user will always be prompted to save the document even if the document was not modified.
In addition to form appearance auto-generation and refresh, PDFNet provides several other means to deal with appearance streams. Some of these techniques are discussed in the section on Advanced Forms.
Filed.GetValue() returns the field's value (an SDF::Obj), whose type varies depending on the field type. For example, a text field might be associated with string values (SDF::Str strings) or null:
if (type == Field.FieldType.e_text
&& field.GetValue())
{
Console.WriteLine("Field value: {0}",
field.GetValue().GetStr());
}
else
{
Console.WriteLine("Field is blank");
}
Form 'flattening' refers to the operation that changes active form fields into a static area that is part of the PDF document, just like the other text and images in the document. A completely flattened PDF form does not have any widget annotations or interactive fields.
Using Field.Flatten() or Page.FlattenField() method it is possible to merge individual field appearances with the page content. PDFNet also allows you to flatten all forms in the document in a single function call (PDFDoc.FlattenFields()).
Note that it is not possible to undo Field.Flatten() operation. An alternative approach to set the field as read only, that can be programatically reversed, is using Field.SetFlag(Field::e_read_only, true) method.
The security mechanism for the high-level document works in the same way as for the SDF document. To secure a document use
PDFDoc::SetSecurityHandler()
method and to open secured document call PDFDoc::InitializeSecurityHandler()
after opening a document. In order to provide a GUI feedback you can optionally derive a class from StdSecurityHandler class. For details on how to secure and read encrypted PDF documents refer to section on SDF security.
The following table lists security permissions available through StdSecurityHandler (a standard security handler for PDF documents)
| Permission | Description |
| e_all | All permissions are granted. |
| e_doc_open | A permission to open a document. |
| e_doc_secure | A permission to change security settings on a document. |
| e_doc_modify | Modify the contents of the document. |
| e_print | Print the document. |
| e_print_high | Print the document to a representation from which a faithful digital copy of the PDF content could be generated. When this permission is not set, printing is limited to a low level representation of the appearance, possibly of degraded quality. |
| e_extract_content | Copy or otherwise extract text and graphics from the document. |
| e_mod_annot | Add or modify text annotations, fill in interactive form fields. |
| e_fill_forms | Fill in existing interactive form fields (including signature fields). |
| e_access_support | Extract text and graphics (in support of accessibility to disabled users or for other purposes). |
| e_assemble_doc | Assemble the document (insert, rotate, or delete pages and create bookmarks or thumbnail images), even if e_doc_modify is not set. |
PDFNet uses standard exception mechanism in C++ or .Net languages (C#, VB, Java) to report illegal program states and to provide a transparent and clean way to handle errors.
C++ Example:
try
{
PDFDoc doc("file.pdf");
doc.PageBegin();
// ...
}
catch (Exception& e)
{
cout << e << endl;
}
catch (...)
{
cout <<"Unknown Exception" << endl;
}
C# Example:
try
{
PDFDoc doc = new PDFDoc("file.pdf");
doc.PageBegin();
// ...
}
catch (PDFNetException e)
{
Console.WriteLine(e.Message);
}