Working with pageskeyboard_arrow_down

Working with pages

In this document
chevron_rightReordering Pages
chevron_rightRotating the Page
chevron_rightCropping the Page

linkDocument's page sequence

A high-level PDF document contains a sequence of Page objects, as illustrated in the following figure:

Figure 4. PDFDoc Page sequence.

To find the number of pages in a PDF document, call PDFDoc.GetPageCount().

To retrieve a specific page of a document, use PDFDoc.GetPage(page_num). Page numbers in the document's page sequence are indexed from 1. If the given page number doesn't index a page in the current document, GetPage(page_num) returns null. For example:

Page page = doc.GetPage(page_num);
if (page != null) 
{
  Console.WriteLine(
    "Document does contain page#: {0}", page_num);
}
else 
{
  Console.WriteLine(
    "Document does not contain page#: {0}", page_num);
}

While GetPage(i) is convenient for retrieving an individual page, it's an inefficient way to enumerate every page of a document. It's better to traverse the pages with a PageIterator.

To do so, simply call PDFDoc.GetPageIterator(). This returns a PageIterator object, which provides HasNext(), Next() and Current() methods. The following code snippet shows how to print the page size for every page in document page sequence:

for (PageIterator itr=doc.GetPageIterator(); itr.HasNext(); itr.Next())
{
  Rect mediabox = itr.Current().GetMediaBox();
  Console.WriteLine("Media box: {0}, {1}, {2}, {3}" 
      mediabox.x1, mediabox.y1, 
      mediabox.x2, mediabox.y2);
}

(This code finds the page size using the page's media box, which we'll talk more about in the following sections.)

To jump to a specific page with a PageIterator, call PDFDoc.GetPageIterator(page_num). If no such page exists, PageIterator.GetPageNumber() returns 0. For example:

PageIterator itr = doc.GetPageIterator(page_num);
if (itr.GetPageNumber() > 0) 
{
  Console.WriteLine(
    "Document does contain page#: {0}", page_num);
}
else 
{
  Console.WriteLine(
    "Document does not contain page#: {0}", page_num);
}

linkCreating a new, blank page

To create a new page, use the PDFDoc.PageCreate(media_box) method. PageCreate() takes an optional Rect argument that can be used to specify page size. This Rect is called a media box.

A media box is a rectangle, expressed in default user space units, defining the boundaries of the physical medium on which the page is intended to be displayed or printed. A user space unit is 1/72 of an inch. If media_box is unspecified, the default dimensions of the page are 8.5 x 11 inches (or 8.5*72, 11*72 units).

Page x = doc.PageCreate();
doc.PagePushBack(x);

The above code snippet creates a new 8.5x11 page and adds it at the end of document's page sequence.

Note that, after the page is created, it does not yet belong to a document's page sequence. The page needs to be placed within the page sequence in order to become "visible". PagePushBack() inserts page x into the position of the document's last page.

linkPage Copying/Merging

The recommended way to copy pages from one document to another is with PDFDoc.InsertPages(). Its arguments are:

  • insertBeforeThisPage: An integer specifying where the pages should be inserted
  • sourceDoc: A PDFDoc from which the pages should be read
  • startPage: An integer specifying the first page number to insert
  • endPage: An integer specifying the last page number to insert
  • flag: A PDFDoc.InsertFlag value (either e\insert_bookmark, meaning bookmarks should be inserted, or _enone_)

For example, suppose we want to insert the third page of one document after the first page of a second document. The following code snippet performs this with an insertBeforeThisPage value of 2 and startPage and endPage values of 3.

Console.WriteLine(
    "dest_doc has {0} pages prior to calling InsertPages. ", 
    dest_doc.GetPageCount());

dest_doc.InsertPages(2, source_doc, 3, 3, PDFDoc.InsertFlag.e_none);

Console.WriteLine(
    "dest_doc has {0} pages following its call to InsertPages. ", 
    dest_doc.GetPageCount());

We can also insert a range of pages. For example, the following code will insert the second, third, and fourth pages of one document into the end of the second document. We specify that we're inserting into the end of the document by using an insertBeforeThisPage value higher than the number of pages in the document:

Console.WriteLine(
    "dest_doc has {0} pages prior to calling InsertPages. ", 
    dest_doc.GetPageCount());

dest_doc.InsertPages(dest_doc.GetPageCount() + 1, 
    source_doc, 2, 4, PDFDoc.InsertFlag.e_none);

Console.WriteLine(
    "dest_doc has {0} pages following its call to InsertPages. ", 
    dest_doc.GetPageCount());

linkAdvanced Page Manipulation

A Page can also be copied from one document to another (or replicated within an existing document) using the PDFDoc.PageInsert(where, pg), PDFDoc.PagePushFront(pg), PDFDoc.PagePushBack(pg) and PDFDoc.ImportPages(list) methods.

PagePushBack(page) appends the given Page at the end of page sequence, whereas PagePushFront(page) inserts the Page at the front of the sequence. PageInsert(**where**, page) inserts the page in front the page currently pointed to by the where PageIterator.

// Append three copies of the page to the document.
doc.PagePushBack(x);
doc.PagePushBack(x);
doc.PagePushFront(x);

// Create a new page and insert it just before 
// the second page
doc.PageInsert(doc.GetPageIterator(2), doc.PageCreate());

Note that it is possible to replicate a given page within a document by repeatedly adding the same page.

The same methods can also be used to merge documents or copy pages from one document to another.

In a PDF document, every page object contains references to images, fonts, color spaces, and other objects required to render the page. In order to accurately copy a page from one document to another, these PageInsert / PagePushFront / PagePushBack methods must copy all referenced resources.

If you are copying several pages between two documents, it's better to use PDFDoc.ImportPages(page_list) because the resulting document will be much smaller and the copy operation will be faster.

ImportPages() is better than other methods for multi page copy because it preserves resource sharing in the target document. This is illustrated in following figures.

Figure 5. Copying pages between two documents using PageInsert/PagePushFront/PagePushBack

In a PDF document, page resources (such as fonts, images, color-spaces, or forms) can be shared across several pages. Sharing these resources reduces file size and speeds up page processing. In Figure 5 above, all three pages of 'Document 1' share the same font and color space object. 'Document 2' was created by direct page copy using PageInsert, PagePushFront or PagePushBack methods. Note that each page now refers to its own separate instances of resource objects.

On the other hand, the result of page copy using ImportPages() is identical to the original document. Note that in 'Document 2', in Figure 6 below, resource objects are shared across pages.

Figure 6. Copying pages between two documents using ImportPages()

Also note that, if pages are copied/replicated within the same document (not between two different documents), all methods behave the same and resources are always shared.

The following code copies pages individually, as in Figure 5:

using (PDFDoc in_doc = new PDFDoc("in.pdf")) 
{
  in_doc.InitSecurityHandler();
  using (PDFDoc new_doc = new PDFDoc()) 
  {
    for (PageIterator itr=in_doc.GetPageIterator(); 
            itr.HasNext(); itr.Next())
    {
      new_doc.PagePushBack(itr.Current());
    }

    // save new_doc...
  }
}

But, as explained above, it's better to import multiple pages with PDFDoc.ImportPages(), as shown in Figure 6.

ImportPages(page_list) creates a copy of pages given in the argument list, while preserving shared resources. Note that the pages in the returned list are ordered in the same way as pages in the argument list and that, although pages are copied, they are not inserted into the document's page sequence. Therefore, in order to be visible, imported or copied pages should be appended or inserted at a specific location within the document's page sequence. For example:

using (PDFDoc in_doc = new PDFDoc("in.pdf")) 
{
  in_doc.InitSecurityHandler();
  using (PDFDoc new_doc = new PDFDoc()) 
  {
    // Create a list of pages to copy.
    ArrayList copy_pages = new ArrayList(); 
    for (PageIterator itr=in_doc.GetPageIterator();
            itr.HasNext(); itr.Next()) 
    {
      copy_pages.Add(itr.Current());
    }

    // Import all the pages in 'copy_pages' list
    ArrayList imported_pages = new_doc.ImportPages(copy_pages);

    // Note that pages in 'imported_pages' list are not yet placed in
    // document's page sequence. This is done in the following step:
    for (int i=0; i!=imported_pages.Count; ++i) 
    {
      new_doc.PagePushBack((Page)imported\_pages\[i\]);
    }

    // save new_doc...
  }
}

linkRemoving/Deleting Pages

Given a PageIterator itr pointing to a page, that page can be deleted using PDFDoc.PageRemove(itr). For example:

// Remove the fifth page from the page sequence.
doc.PageRemove(doc.GetPageIterator(5));

// Remove the third page. 
PageIterator i = doc.GetPageIterator();
i.Next();
i.Next();
doc.PageRemove(i);

PDFDoc.PageRemove(itr) only removes the page from document's page sequence. The page and its resources are still available until the document is saved in 'full save mode' with the 'remove unused objects' flag. If you are saving the file in 'incremental mode', the serialized document may contain the content of the removed page.

linkReordering Pages

Given the copy and delete page operations described in previous sections it is easy to re-arrange and sort pages. For example, the order of pages in the document can be reversed as follows.

int page_num = doc.GetPageCount();
for (int i=1; i<=page_num; ++i)
{
  PageIterator itr = doc.GetPageIterator(i);
  Page page = itr.Current();
  doc.PageRemove(itr);
  doc.PagePushFront(page);
}

linkRotating the Page

A page can be rotated clockwise, by multiples of 90 degrees, when displayed or printed. The Page.GetRotation() method returns the Page.Rotate enum specifying the current rotation. Similarly, Page.SetRotation() sets the current rotation. For example:

// Rotate the first page 90 degrees clockwise.
Page.Rotate originalRotation = doc.GetPage(1).GetRotation();
Page.Rotate rotation;
  switch (originalRotation)
  {
    case Page.Rotate.e_0:   rotation = Page.Rotate.e_90;  break;
    case Page.Rotate.e_90:  rotation = Page.Rotate.e_180; break;
    case Page.Rotate.e_180: rotation = Page.Rotate.e_270; break;
    case Page.Rotate.e_270: rotation = Page.Rotate.e_0;   break;
    default:                rotation = Page.Rotate.e_0;   break;
  }
doc.GetPage(1).SetRotation(rotation);

linkCropping the Page

The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. The default value is the page's media box. A new crop box can be imposed on a page with Page.SetCropBox(), as follows:

page.SetCropBox(Rect.CreateSDFRect(0, 0, 500, 600));

The existing crop box of a page can be discovered with Page.GetCropBox():

Rect crop_box = page.GetCropBox();
// Crop box is: 
// rect.x1, rect.y1, 
// rect.x2, rect.y2

linkMedia Box Adjustments

The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary can safely be discarded without affecting the visible output of the PDF document. A new value for a page's media box can be specified as follows:

page.SetMediaBox(Rect.CreateSDFRect(0, 0, 500, 600));

linkShifting Page Content

Page content can be horizontally and vertically translated by adjusting the media box. For example, the following code will translate all page contents 2 inches= 72 units per inch * 2 inches to the left.

Rect media_box = page.GetMediaBox(); 
// translate the page 2 inches horizontally
media_box.x1 += 144;
media_box.x2 += 144;
page.SetMediaBox(mediabox);