Some test text!

SDF / COS object model

Contents

SDF.Obj
SDFDoc

Real-life PDF documents are much more complex than the "Hello World" PDF sample from the previous section. Streams in a PDF document can be compressed and encrypted, objects can form complex networks, and, in PDF 1.5, parts of the object graph can be compressed and embedded in so-called "object streams". All this makes manual editing of PDF documents extremely difficult — even impossible. The good news is that PDFTron Systems released CosEdit — a graphical utility for browsing and editing PDFdocuments at the object level, offering unprecedented ease and control. PDFTron SDK also provides a full SDF/COS level API making it very easy to read, write, and edit PDF and FDF at the atomic level. Furthermore, PDFTron SDK also provides a high-level API for reading, writing, and editing PDF documents at the level of pages, bookmarks, graphical primitives, and so on.

SDF (Structured Document Format) and COS (Carousel Object System; Carousel was a codename for Acrobat 1.0) are synonyms for PDF low-level object model. SDF is the acronym used in PDFTron SDK, whereas COS is a legacy word used in the Acrobat SDK.

In many ways, SDF is to PDF what XML and DOM are to SVG (Scalable Vector Graphics). The SDF/COS object system provides the low-level object type and file structure used in PDF documents. PDF documents are graphs of SDF objects. SDF objects can represent document components such as bookmarks, pages, fonts, and annotations, and so on.

PDF is not the only document format built on top of SDF/COS. FDF (Form Data Format) and PJTF (Portable Job Ticket Format) are also built on top of SDF/COS.

linkSDF.Obj

The SDF layer deals directly with the data that is in a PDF document. The data types are referred to as SDF objects. There are eight data types found in PDF documents. They are arrays, dictionaries, numbers, boolean values, names, strings, streams, and the null object. PDFTron SDK implements these objects as shown in the following graph:

In C#, all objects ultimately derive from the Object class. Similarly, all SDF objects ultimately derive from the Obj class. Following the Composite design pattern, Obj implements each method found in its derived classes. Thus you can invoke a member function of any derived object through the base Obj interface. This is illustrated in the following code sample:

SDFDoc doc = new SDFDoc("in.pdf");

// Get the trailer
Obj trailer = doc.GetTrailer();

// Get the info dictionary.
Obj info = trailer.Get("Info").Value();

// Replace the Producer entry
info.PutString("Producer", "PDFNet");

// Create a custom inline dictionary within
// Info dictionary
Obj custom_dict = info.PutDict("My Direct Dict");

// Add some key/value pairs
custom_dict.PutNumber("My Number", 100);
Obj my_array = custom_dict.PutArray("My Array");

// Create a custom indirect array within Info dictionary
Obj custom_array = doc.CreateIndirectArray();    
info.Put("My Indirect Array", custom_array);

// Create indirect link to root
custom_array.PushBack(trailer.Get("Root").Value());

doc.Save("out.pdf", 0, "%PDF-1.4");  // Save PDF

If a member function is not supported on a given object (e.g. if you are invoking obj.GetName() on a Bool object), an Exception will be thrown.

In order to find out type-information at run-time, use obj.GetType() or obj.Is_**type**_() methods (where type could be Array, Number, Bool, Str, Dict, or Stream). Usually, an object's type can be inferred from the PDF/FDF specification. For example, when you call doc.GetTrailer(), you can assume that the returned object is a dictionary object because this is mandated by PDF specification. If an object is not a dictionary, calling a dictionary method on it throws an exception. These semantics are important for stylistic reasons — since type casts and type checks are not required, you can keep your code efficient and elegant. In case there is an ambiguity in PDF/FDF specification, you can use GetType() or Is_**type**_() methods.

As mentioned in the previous section, SDF objects can be either direct or indirect. Direct objects can be created using Obj.Create_**type**_() methods. The following example illustrates how to create direct number and direct name objects inside Dict objects. Note that the same approach will work for Array objects.

// you can create direct objects inside container objects.
doc.GetRoot().PutNumber("My number key", 100);
doc.GetRoot().PutDict("My dict key");
doc.GetRoot().PutName("My name key", "My name value");

New indirect objects can be created using doc.CreateIndirect_**type**_() methods on an SDF document. The following code shows how to create new Number and Dictionary indirect objects:

Obj mynumber = doc.CreateIndirectNumber(100);
Obj mydict = doc.CreateIndirectDict();

PDFTron SDK SDF provides many utility methods that can be used to efficiently traverse an SDF object graph. Here is an example on how to get to a document's page root:

Obj pages = doc.GetTrailer()
               .Get("Root").Value()
               .Get("Pages").Value();

Note that because the PDF specificationp mandates that "Root" is always a dictionary, we can directly reference the "Pages" object by calling Get("key"). Note also that some so-called "PDF" documents are corrupt, meaning that the documents are not in compliance with the PDF specification. In some corrupt PDF documents, the "Root" may be missing or may not be a dictionary object. In these and similar cases, the PDFTron SDK throws an exception.

In order to retrieve an object that may or may not be present in a dictionary, use the dict.FindObj("key") method. For example:

Obj my_value = dict.FindObj("my\_key");
if (my_value != null)
{
    // ...use my_value...
}
else
{
    // "my_key" does not exist in dict
}

You can use DictIterator in order to traverse key-value pairs within a dictionary:

for (DictIterator itr = dict.GetDictIterator();
                  itr.HasNext();
                  itr.Next())
{
  // itr.Key();
  // itr.Value();
}

To retrieve objects from an Array object, use array.GetAt(idx) method:

for (int i = 0; i < array.Size(); ++i)
{
  Obj obj = array.GetAt(i);
  // ...
}

In the previous section, we learned how to create indirect objects by calling the SDFDoc.CreateIndirect_**type**_() methods. Now, let's look at how to create references to those indirect objects. The following code shows how:

Obj indirect_dict = doc.CreateIndirectDict();    
indirect_dict.PutName("My key", "My value");

Obj trailer_dict = doc.GetTrailer();
if (trailer_dict != null)
{
    Obj info_dict = trailer_dict.Get("Info").Value();
    if (info_dict != null)
    {
        // Add indirect reference to 'shared_dict'.
        info_dict.Put("MyDict", shared_dict);

        Obj root_dict = trailer_dict.Get("Root").Value();
        if (root != null)
        {
            // Add a second indirect reference to 'shared_dict'.
            root.Put("MyDict", shared_dict);
        }
    }
}

So it's possible for multiple objects to refer to the same object. We call such objects shared objects. But shared objects must always be indirect objects. So if you want to share an object, it must have been created using SDFDoc.CreateIndirect_**type**_, or you should test Obj.IsIndirect() to make sure it's an indirect object.

Because the PDF document format disallows creating multiple links to direct objects, PDFTron SDK will throw an exception should you try to create multiple links/references to a direct object. This is shown below:

Obj trailer_dict = mydoc.GetTrailer();
if (trailer_dict != null)
{
    Obj info_dict = trailer_dict.Get("Info").Value();
    if (info_dict != null)
    {
        Obj direct_obj = info_dict.PutDict("Link1");

        Obj root_dict = trailer_dict.Get("Root").Value();
        if (root_dict != null)
        {
            // Attempt to create a second link to direct_obj.
            // This will copy the object. If you want to
            // share objects, create them using the
            // PDFDoc.CreateIndirect() methods.
            root_dict.Put("Link2", direct_obj);
        }
    }
}

In addition to the basic types of objects mentioned so far, PDF also supports stream objects. A stream object is essentially a dictionary with an attached binary stream. In PDFTron SDK, all methods that apply to dictionaries apply to streams as well.

In addition to the methods provided by Dict, streams provide an interface used to access an associated data stream. Given a stream Obj, you can use GetDecodedStream() to get decoded data or GetRawStream() to get raw, undecoded data. GetRawStreamLength() returns the length of the raw data stream. This number is the same as the one stored under “Length” key in the stream dictionary.

PDFTron SDK supports all compression and encryption schemes used in the PDF format. It provides transparent access to decoded stream data. The following code decodes and extracts the contents of a given stream to an external file:

Obj stream = ...
Filter dec_stm = stream.GetDecodedStream();
dec_stm.WriteToFile("out.bin", false);

For a more complete discussion on PDFTron SDK Filters see PDFTron SDK Filters and Streams.

linkSDFDoc

Our overview of the SDF object model could not be complete without mentioning SDFDoc. SDFDoc brings together document security, document utility methods, and all SDF objects.

An SDF document can be created from scratch using a default constructor:

SDFDoc sdfdoc = new SDFDoc();
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();

An SDF document can be also created from an existing file, such as an external PDF document:

SDFDoc sdfdoc = new SDFDoc("in.pdf");
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();

Or it can be created from a memory buffer or some other Filter/Stream:

MemoryFilter memory = ....
SDFDoc sdfdoc = new SDFDoc(memory);
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();

Finally, an SDF document can be accessed from a high-level PDF document as follows:

PDFDoc pdfdoc = new PDFDoc("in.pdf");
pdfdoc.InitSecurityHandler();
SDFDoc sdfdoc = pdfdoc.GetSDFDoc();
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();

Note that the examples above use sdfdoc.GetTrailer() in order to access the document trailer, which is the starting SDF object (root node) in every document. Following the trailer links, we can visit all low-level objects in a document (e.g. all pages, outlines, fonts, and so on).

SDFDoc also provides utility methods used to import objects and object collections from one document to another. These methods can be useful for copy operations between documents such as a high-level page merge and document assembly.

Get the answers you need: Support

Contents

SDF.Obj
SDFDoc