General Questions
API Questions
- Common Problems / Errors / Exceptions
- PDF
- Bookmarks
- PDF Split and Merge
- Forms
- Printing
- SDF
- Security
Didn't
find an answer to your question? Try searching our Knowledge
Base forum or simply email your question to support at pdftron.com.
PDFNet can be used to access a wide range of PDF functionality.
The most commonly used include:
- PDF Creation. Using PDFNet PDF can be created
dynamically and delivered on the fly (e.g. on web and database
servers). The library can also be integrated with client-based
applications in order to provide enhanced PDF output that can’t
be produced using generic PDF writers (such as PDF print drivers
or PostScript converters).
- Editing. PDFNet provides a comprehensive API
that can be used to edit all aspects of PDF document. Using PDFNet
it is easy to:
- Append and assemble PDF documents
- Merge specific PDF pages from multiple documents
- Delete or rearrange pages
- Edit page contents
- Add/remove/edit images, text, and vector graphics.
- Edit fonts, color spaces, line styles, and other attributes.
- Edit document metadata
- Crop and rotate pages
- Edit bookmarks and page annotations.
- Edit security settings
- Edit every aspect of a document using powerful SDF API.
- Content extraction. PDFNet can be used to extract
text, images, fonts, ICC color profiles, embedded files etc. Complete
content extraction features make PDFNet a solid foundation for
PDF viewers, editors, document converters, and software RIPs (Raster
Image processors)
.
- PDF rasterization, viewing, and printing.
Using PDFNet, client applications can take advantage of interactive
PDF display, while server-based applications can generate images
and thumbnails on the fly.
- Fonts. PDFNet provides a unified and an easy
to use API that can be used to embed, extract, and process all
font formats supported by PDF (i.e. Type1, TryeType, Type3, Multiple
Master, CFF, CIDType 0, and CID Type1).
- Forms. PDFNet can be used to read, write, and
edit PDF forms.
- Prepress workflows. PDFNet supports prepress
workflows by providing solid infrastructure and utilities for
color conversion, separation, preflighting, and imposition operations.
- Optimization and linearization
- Compression
- Security and encryption. Set new security handlers
and edit or remove existing security.
For a more complere comprehensive feature listing, please take
a look at PDFNet
Feature Chart:
[ Back to top ]
All PDFTron software applications and components are stand-alone
products. Therefore, there are no dependencies on third-party components
or software.
[ Back to top ]
Yes. PDFNet is an ideal solution for integrating PDF capabilities
into server
applications. Using the PDFNet, your applications can dynamically
generate, manipulate, and print PDF documents in server environments.
[ Back to top ]
PDFNet is fully safe for multithreading and can be used in applications
that run multiple concurrent PDFNet threads.
Support for multithreading is document based. You can spawn a separate
thread for each document, however only one thread can operate on
a single document at the same time.
[ Back to top ]
PDFNet works on hyper-threaded and multi-processor machines. The
only difference is that if you plan to run PDFNet on a multi-processor
machine you need to purchase a separate license for each physical
CPU.
[ Back to top ]
Yes. PDFNet SDK Java/C/C++ library is available on Linux, Mac OSX,
Solaris, and Windows. On Windows PDFNet is also available as a .Net
component.
[ Back to top ]
PDFNet SDK is available for Java on all supported platfroms. As
a starting point you may want to browse online Javadoc-s
or Java sample
projects.
[ Back to top ]
PDFNet is available on Windows (.NET /Java/C/C++), Mac OSX (Java/C/C++),
Solaris (Java/C/C++), and Linux (Java/C/C++).
[ Back to top ]
Using PDFNet you can save any existing or newly created PDF document
in linearized (fast web view) format.
In order to provide good performance over relatively slow communication
links, PDFNet can generate PDF documents with linearized objects
and hint tables that can allow a PDF viewer application to download
and view one page of a PDF file at a time, rather than requiring
the entire file (including fonts and images) to be downloaded before
any of it can be viewed.
The only thing required to save a document in linearized (fast
web view) format is to pass 'Doc.SaveOptions.e_linearized' flag
in the save method.
[ Back to top ]
Standards such as PDF/X and PDF/E define a subset of the PDF specification
designed for a specific industry (e.g. publishing, engineering,
etc). PDFNet allows processing and generation of any PDF document,
and it does not prevent the user from creating a valid document
that includes features that are not listed in one of PDF subsets.
To make your application PDF/X (or PDF/E) compliant, you need to
make sure that you are using only PDF features allowed by a given
subset.
[ Back to top ]
PDFNet is not a single SDK, but a family of SDK-s which are available
on different platforms and programming environments. PDFNet for
.Net is a true .NET component written in managed C++ that can be
used from any .NET language such as C#, VB.Net, and managed C++.
PDFNet for .Net is not a wrapper around another COM componet.
Because PDFNet for .Net is written in managed C++, applications
can take advantage of significant performance gains.
[ Back to top ]
If you have layout and data for output pages, you can output PDF
files using the PDFNet. As a starting point, you may want to take
a look at ElementBuilder
sample project.
[ Back to top ]
See PDFNet licensing page for details.
[ Back to top ]
PDFNet is fully supported by PDFTron Systems and we have a dedicated
support team. In addition, we have a regular maintenance release
cycle. We ship new maintenance releases to licensees every four
to six weeks. We also have major feature releases that are synchronized
with major revisions of the PDF specification. As with license fees,
support and maintenance fees are also based on how much functionality
you use.
[ Back to top ]
PDF PageMaster and PDF
Secure are implemented using PDFNet. PDFNet has a more complex
API that can be used to read, write, and edit every aspect of a
PDF document.
[ Back to top ]
The API for C++ and .Net versions is identical so the same reference
is used for both C++ and .Net languages. Because the API is identical
it is very easy to port PDFNet code between managed and unmanaged
languages. If you are developer with Java background, you may also
want to refer to Javadoc
version of the API Reference.
[ Back to top ]
Common Problems / Errors / Exceptions
'pdfnet.res' includes standard PDF resources (such as standard
PDF CMaps, color profiles, fonts, etc). If you would like to support
reading or editing of any existing PDF document, you will need to
distribute 'pdfnet.res' as part of your application. If your application
is only creating PDF documents, you typically don't need to distribute
the resource file.
To indicate the location of PDFNet resources relative to your application
module, you can use PDFNet.SetResourcePath() function. If you are
developing a .NET application you can set the resource path as follows:
PDFNet.SetResourcesPath( Path.Combine(Application.StartupPath,
"MyResources"));
or as follows:
PDFNet.SetResourcesPath( Path.GetDirectoryName( Assembly.GetExecutingAssembly().Location));
If you are building a C++ application, you could use the first
parameter passed to the main function. For example,
string GetPathName(const string& s) {
char sep = '/';
#ifdef _WIN32
sep = '\\';
#endif
size_t i = s.rfind(sep, s.length( ));
if (i != string::npos) {
return(s.substr(0, i));
}
return("");
}
int main(int argc, char *argv[]) {
string path = GetPathName(argv[0]);
PDFNet::Initialize();
PDFNet::SetResourcesPath(path.c_str());
...
}
On Windows, you can also use GetModuleFileName() Win32 function
to obtain the path to the current module (i.e. an executable or
a DLL).
[ Back to top ]
This exception indicates that PDFNet can't find the resource file
required for processing of some PDF files (e.g. documents using
predefined CJKV CMaps).
By default, PDFNet searches the application/DLL folder in order
to find the resource file (i.e. 'pdfnet.res'). You can either copy
'pdfnet.res' into the same folder as PDFNETxx.DLL or use PDFNet.SetResourcePath
to point to a different location.
[ Back to top ]
Make sure that you initialized PDFNet using PDFNet.Initialize().
PDFNet.Initialize()/Terminate() should be called only once per process
session. You should not call PDFNet.Initialize()/Terminate() for
each PDFNet thread.
[ Back to top ]
Although GDIPLUS.dll is a standard part of .NET framework, on some
W2K systems it is not located in DLL search path. You can use Dependency
Walker (www.dependencywalker.com) to check whether PDFNET.DLL can
locate GDIPLUS.dll on your machine. If there are any missing dependencies,
they will be highlighted in red. To solve the problem you can add
the folder containing GDIPLUS.dll to the 'path' environment variable,
copy GDIPLUS.dll to 'Windows/System32' folder, etc.
[ Back to top ]
The most likely cause of an error that occurs at the start of content
extraction is un-initialized security handler. To initialize the
security handler call doc.InitializeSecurityHandler() just after
opening a document. This method has no side effects on documents
that are not encrypted so you can make a convention to always invoke
doc.InitializeSecurityHandler() after constructing a document. You
can use doc.IsEncrypted() method in case you would like to verify
that the document is encrypted. If the error occurs even after SecurityHandler
is initialized, please contact support for further assistance.
[ Back to top ]
This error may occur if you run PDFNet for Microsoft.Net 1.1x or
higher on a machine running Microsoft.Net version 1.0x. If this
is the case you need to download PDFNet for Microsoft.Net 1.0x or
you need to upgrade the virtual machine.
In Microsoft.NET version 1.0x there were some critical bugs that
were fixed in versions 1.1x and higher. We highly recommend upgrading
to versions 1.1x or higher, but in cases where this is not an option,
use PDFNet for Microsoft.NET versions 1.0x.
[ Back to top ]
The most likely cause of unexpected token or syntax error occurring
in headers is macro misuse declared in header files before PDFNet
headers.
For example, windows.h declares macro called GetMessage which will
conflict with PDFNet Exception::GetMessage() method. To avoid this
error either #include affected PDFNet headers before the header(s)
introducing the offending macro(s) or use #undef directive before
PDFNet headers.
[ Back to top ]
Because there is no standard for binary distribution in C++ there
are some common issues when it comes to library interoperability.
PDFTron addressed the problem in part by offering PDFNet SDK for
different platforms, compilers, and common code generation modes.
Still this does not remove C++ interoperability problem entirely.
For example, you may want to move to an exotic C++ compiler, or
you may be working on a on a large scale project with many library
modules coming from different sources.
One way to overcome these interoperability problems is to package
PDF related functionality in a form of a module or a DLL. In this
case you have a full control over the build process of your own
module. For extra compatibility you can expose the high level functionality
(such as a conversion function) as a plain "C" API which
is standard across different compilers. Another, though less portable,
option is COM.
[ Back to top ]
The simplest way to access document's metadata (e.g. author, title,
keywords, etc.) is using PDFDocInfo class. For example,
PDFDocInfo info = mydoc.GetDocInfo();
String title = info.GetTitle();
info.SetTitle("My Title");
etc...
Alternatively you can access document's metadata using SDF/Cos
API:
PDFDoc doc = new PDFDoc(...);
doc.InitSecurityHandler();
Obj trailer = doc.GetTrailer(); // Get the trailer
Obj info = trailer.FindObj("Info");
if (info != null) {
// Get 'Title'/'Author'/'Keywords'/'Subject'...
// entry, if available
Obj title_obj = info.FindObj("Title");
if (title_obj != null)
{ // Note: In some documents these strings are encoded
// using PDF text encoding.
String title = title_obj.GetString();
title_obj.SetString("My Title...");
}
else {
info.PutString("Title", "My Title...");
}
}
[ Back to top ]
You can use Page.GetMediaBox() to obtain the dimensions of the
media box for the page. For example,
Rect bbox = page.GetMediaBox();
bbox.Normalize()
// the width and height of the page in page units
width = bbox.Width();
height = bbox.Height();
One page unit is 1/72 of an inch. For a 'letter' size page (8.5 x 11 inches) the dimensions
will be:
width = 612 units = 612 * 1/72 = 8.5 inches
height = 792 units = 792 * 1/72 = 11 inches
[ Back to top ]
You can create a new page that is the same size as an existing
page as follows:
Rect media_box = existing_page.GetMediaBox();
Page new_page = doc.PageCreate(media_box);
A slightly harder but more powerful technique to accomplish the
same task is using Cos/SDF API:
Page new_page = doc.PageCreate();
Obj pg = existing_page.GetSDFObj();
new_page.GetSDFObj().Put("MediaBox", pg.Get("MediaBox").Value());
This example can be trivially extended so you can copy arbitrary
entries from an existing page (i.e. you can copy over the crop/bleed/art
box, page resources, annotations, etc.)
[ Back to top ]
To get the page number from a given page, use Page.GetIndex() method.
[ Back to top ]
Q: I've encountered a problem retrieving GetFontSize().
It seems to always return a double value 1.0. How to determine the
real size of font?
A: GetFontSize() returns the correct value. You
can use CosEdit to check this;
somewhere in the page content stream you will find /Fn 1 Tf operator.
If you want to get the font size as it appears on the page you need
to scale GetFontSize() with text matrix (Element.GetTextMatrix())
, as well as, current transformation matrix (CTM):
double scale_factor = Math.sqrt(mtx.m_b*mtx.m_b + mtx.m_d*mtx.m_d);
double page_font_sz = gs.GetFontSize() * scale_factor;
For a complete example on how to get font size in the user space
please refer to ElementReaderAdv test
project in Samples folder.
[ Back to top ]
Relative text positioning coordinates can be accessed using CharIterator.
Absolute text positioning is a function of: Element.GetCTM(),
Element.GetTextMatrix(), and relative character positioning information
(i.e.char_itr.Current().x, char_itr.Current().y).
The simplest approach to obtain the bounding box (in absolute or
PDF user coordinate system) for a given text run is using
element.GetBBox(rect) method.
To obtain absolute text positioning information for each character
in the text run, you need to concatenate the current transformation
matrix (ElementReader.GetCTM()) with the current text matrix (ElementReader.GetTextMatrix()).
To get absolute character
positioning information you would multiply the resulting matrix
with the relative character position (char_itr.Current().x, char_itr.Current().y)
from CharIterator.
Please refer to section '5.3.3 Text Space Details' in the PDF Reference
Manual for more details on how text coordinates are transformed
into
PDF user space.
// CTM (current transformation matrix).
Matrix2D ctm = element.GetCTM();
Matrix2D text_mtx = element.GetTextMatrix();
double x, y;
int char_code;
CharIterator end = element.CharEnd();
for (CharIterator itr = element.CharBegin(); itr.HasNext(); itr.Next())
{
x = itr.Current().x; // relative character positioning information
y = itr.Current().y;
// To get the absolute character coordinate you need to concatenate
// the current text matrix (CTM) with current text matrix
// and then multiply relative character postitioning coordinate.
// Matrix2D mtx = ctm * text_mtx;
// mtx.Mult(x, y); // (x, y) is now the absolute coordinate.
}
For a complete example on how to get text and character positioning
information please refer to ElementReaderAdv
test project in Samples folder.
[ Back to top ]
Given a font object you can remove embedded font streams as follows:
// Using C#:
Obj fd = myfont.GetDescriptor();
if (fd == null) return; // If null, the font is not ebedded
fd.Erase("FontFile");
fd.Erase("FontFile2");
fd.Erase("FontFile3");
...
doc.Save(..., Doc.SaveOptions.e_linearized);
To find all fonts in the document, you can either traverse all
page resources (i.e. 'Font' entry in the page resource dictionary),
or iterate over all document objects. For example:
... Init PDFNet ... PDFDoc doc = new PDFDoc("in.pdf"); doc.InitSecurityHandler(); SDFDoc cos_doc = doc.GetSDFDoc(); int num_objs = cos_doc.XRefSize(); for (int i=1; i<num_objs; ++i) { Obj obj = cos_doc.GetObj(i); if (obj!=null && !obj.IsFree()&& obj.IsDict()) { // Process only Fonts DictIterator itr = obj.Find("Type"); if (itr.HasNext() == false || itr.Value().GetName() != "Font") continue; itr = obj.Find("FontDescriptor"); if (itr.HasNext() == false) continue; if (!itr.Value().IsDict()) continue; Obj fd = itr.Value(); fd.Erase("FontFile"); fd.Erase("FontFile2"); fd.Erase("FontFile3"); } } doc.Save(...) doc.Close();
[ Back to top ]
Given an element with 'e_text' type, you can obtain its high-level
font object as follows:
Font font = element.GetGState().GetFont();
To check if the font is italic, you could use font.IsItalic().
You can also obtain all other properties from font descriptor dictionary
(see section 5.7 'Font Descriptors' in PDF Reference Manual).
For example (using C#),
Obj fd = font.GetDescriptor();
if (fd != null) {
double italic_angle = 0, weight=400;
Obj obj= fd.FindObj("ItalicAngle");
if (obj != null) {
italic_angle = obj.GetNumber();
}
obj = fd.Find("FontWeight");
if (obj != null) {
// A value of 400 indicates a normal weight; 700 indicates
bold.
weight = obj.GetNumber();
}
}
[ Back to top ]
Text runs (e.g. elements of type e_text) represent a stream of
text, but text-runs do not directly correspond to words. For example,
you may have a single word that consist of letters in various fonts
and styles. In this case each letter would correspond to a separate
text-run. Also you may encounter text-runs that contain multiple
words separated by spaces.
The most straightforward approach to extract words from text-runs
is using pdftron.PDF.TextExtractor class (as shown
in TextExtract
sample project).
In case TextExtractor does not meet all of your requirements you
can also implement your own word recognizer using the low-level
text APIs.
[ Back to top ]
Q: While extracting content from a PDF document,
the sequence represents the painting order for content and not the
order as it is seen on the screen. How do I extract text in the
reading order, and not in the sequence given in PDF?
A: Unfortunately, most PDF documents do not include
enough logical structure to extract the reading order. As a result,
it is usually necessary to reconstruct the reading order based on
the content positioning on the page. To obtain the positioning information
for every graphical element on the page, you could use element.GetBBox(rect)
method. Using this information it is possible to build a structure
that can be used to extract the content in a specific reading order.
Also starting with version 4, PDFNet SDK includes high-level APIs
(TextExtractor, SElement, STree,
etc) that can be used to automatically reconstruct logical structure
for any PDF document.
[ Back to top ]
You can use PDFNet in order to generate 'PDF Searchable Images'.
PDF Searchable Images are created using invisible text drawn on
top of scanned images. In order to make invisible text that can
be highlighted or searched, you need to set TextRenderingMode flag
in the graphics state of the text element (i.e. Element. GetGState().
SetTextRenderMode( GState.TextRenderingMode.e_invisible_text ) ).
[ Back to top ]
Using PDFNet you can place watermarks or append new content (such
as such as text, logo, or images) using ElementWriter and ElementBuilder
as illustrated in the following snippet:
PDFNet.Initialize();
try
{
PDFDoc doc = new PDFDoc("my.pdf");
doc.InitSecurityHandler();
ElementBuilder eb = new ElementBuilder();
ElementWriter writer = new ElementWriter();
// Get the first page
Page page = doc.GetPage(1);
// Begin writing to the page
writer.Begin(page);
// Begin writing a block of text
Element element = eb.CreateTextBegin(
Font.Create(doc,
Font.StandardType1Font.e_times_roman), 12);
writer.WriteElement(element);
string txt = "Hello World!";
element = eb.CreateTextRun(txt);
// Scale-up text 5 times and shift it by (0,600)
element.SetTextMatrix(5, 0, 0, 5, 0, 600);
writer.WriteElement(element);
// Set the spacing between lines
element.GetGState().SetLeading(15);
writer.WriteElement(eb.CreateTextNewLine());
// Draw the same text string; this time stroked.
element = eb.CreateTextRun(txt);
GState gstate = element.GetGState();
gstate.SetTextRenderMode(
GState.TextRenderingMode.e_stroke_text);
gstate.SetCharSpacing(-1.25);
gstate.SetWordSpacing(-1.25);
writer.WriteElement(element);
// Finish the block of text
writer.WriteElement(eb.CreateTextEnd());
writer.End();
doc.Save("out.pdf", 0);
doc.Close();
}
catch (PDFNetException e) {
Console.WriteLine(e.Message);
}
The following code snippet illustrates how to stamp all pages in
the document with a "Hello World!" string.
PDFDoc doc = new PDFDoc("in.pdf");
doc.InitSecurityHandler();
ElementBuilder eb = new ElementBuilder();
ElementWriter writer = new ElementWriter();
PageIterator itr=doc.GetPageIterator();
for (; itr.HasNext(); itr.Next())
{
writer.Begin(itr.Current());
Element element = eb.CreateTextBegin(
Font.Create(doc,
Font.StandardType1Font.e_times_roman),64);
writer.WriteElement(element);
element = eb.CreateTextRun("Hello World!");
// Position the text run
element.SetTextMatrix(1, 0, 0, 1, 20, 20);
writer.WriteElement(element);
writer.WriteElement(eb.CreateTextEnd());
writer.End(); // Save the changes
}
doc.Save("out.pdf", 0);
doc.Close();
For a longer code example, illustrating the use of ElementBuilder
and ElementWriter, please take a look at ElementBuilder
sample project.
Using PDFNet it is also possible to create watermark annotations
using the similar procedure as outlined above. You would use ElementBuilder/ElementWriter
to create new appearance stream and Annot class to create the annotation
object.
[ Back to top ]
PDFNet allows direct embedding of various raster image images as
well as GDI+ Bitmaps. For a concrete sample code please take a look
at AddImage sample project.
[ Back to top ]
You can get the image resolution using Element/Image.GetImageWidth()
and Element/Image.GetImageHeight() methods. If you want to calculate
DPI of the image as it appears on the target medium (i.e. paper)
you need to take into account the current transformation matrix
(CTM). Use Element.GetCTM() method in order to get the current transformation
matrix (CTM).
If the CTM does not include rotation or skew the image will be
positioned at (GetCTM().m_h, GetCTM().m_v) and will be GetCTM().m_a
units wide and GetCTM().m_d units high. Note that one unit in the
user space is equal to 1/72 of an inch.
[ Back to top ]
In PDF the image can be rotated by any degree. The image can also
be stretched, skewed, etc. The transformation is specified using
the Current Transformation Matrix (CTM) which can be accessed using
the Element.GetCTM() method.
Use the following code snippet (pseudocode) to calculate image
rotation angle (in radians):
double GetRotation(Matrix2D& mtx) {
double x1=0, y1=0, x2=1, y2=0;
mtx.Mult(x1, y1);
mtx.Mult(x2, y2);
return atan2(y2-y1, x2-x1);
} The position of the image on the page is given
using the translation component of the matrix (i.e mtx.m_h, mtx.m_v).
[ Back to top ]
You can place one half of the image on one PDF page and the other
half of the image on a second PDF page as follows (using C# pseudo-code):
Image img = Image.Create(doc.GetSDFDoc(), data, width,
height, 8, ColorSpace.CreateDeviceRGB(),
Image.InputFilter.e_jpeg);
ElementBuilder eb = new ElementBuilder();
// Create page #1 -------------
Page page = doc.PageCreate();
writer.Begin(page);
// Use a clipping path in order to show only a
// portion of the image // Save the graphics state
// so that the clipping path does not affect
// other graphics on the page.
writer.WriteElement(eb.CreateGroupBegin());
// Create a clipping path.
eb.PathBegin();
eb.CreateRect (0, 0, 200, 100);
Element element = eb.PathEnd();
// this is a clipping path
element.SetPathClip(true);
element.SetPathStroke(false);
element.SetPathFill(false);
// Write clip path
writer.WriteElement(element);
// Place the first half of the image behind the clip path.
element = eb.CreateImage(img,
new Matrix2D(200, 0, 0, 200, 0, 0));
writer.WritePlacedElement(element);
// Restore the graphics state.
writer.WriteElement(eb.CreateGroupEnd());
writer.End();
doc.PagePushBack(page);
// Create page #2 -------------
page = doc.PageCreate();
writer.Begin(page);
writer.WriteElement(eb.CreateGroupBegin());
// Create a clipping path.
eb.PathBegin();
eb.CreateRect (0, 100, 200, 100);
element = eb.PathEnd();
element.SetPathClip(true);
element.SetPathStroke(false);
element.SetPathFill(false);
writer.WriteElement(element);
// Place the second half of the image behind
// the clip path.
element = eb.CreateImage(img,
new Matrix2D(200, 0, 0, 200, 0, 0));
writer.WritePlacedElement(element);
writer.WriteElement(eb.CreateGroupEnd());
writer.End();
doc.PagePushBack(page);
[ Back to top ]
Q: I have a PDF which we need to brand with a client
logo at run-time. The PDF has a dummy image logo which should be
replaced with the client logo?
A: Using PDFNet you can replace (swap) an image
in an existing PDF document as follows:
- Find the image that should be replaced (source image). You can
do this by enumerating page contents using ElementReader and looking
for Elements with type e_image. Another option is to enumerate
page image resources directly using SDF/Cos API (e.g. page.GetResourceDict().
Get("XObject").Value() ...).
- Create a replacement image using Image img = Image.Create??()
methods as illustrated in AddImage sample project.
- Swap the two images as follows:
SDFDoc doc = pdfdoc.GetSDFDoc();
int img1_objnum = img1.GetSDFObj().GetObjNum();
int img2_objnum = img2.GetSDFObj().GetObjNum();
doc.Swap(img1_objnum, img2_objnum);
[ Back to top ]
The following sample code illustrates how to set a transformation
matrix on an Image element:
Element* element = eb.CreateImage(Image(...));
double deg2rad = 3.1415926535 / 180.0;
// Translate
Matrix2D mtx = Matrix2D(1, 0, 0, 1, 0, 200);
// Scale
mtx *= Matrix2D(300, 0, 0, 200, 0, 0);
// Rotate
mtx *= Matrix2D::RotationMatrix( 90 * deg2rad );
element->GetGState()->SetTransform(mtx);
writer.WritePlacedElement(element); The RotationMatrix accepts an angle in radians.
Please note that the order of transformations (i.e. matrix multiplications)
is stack based. The same convention is used in PostScript, PDF,
and OpenGL.
[ Back to top ]
PDFNet supports extraction of all content available in PDF document.
On
the other hand PDF standard does not directly support abstract constructs
such as paragraphs, columns, tables, etc. Because the logical structure
is missing in PDF document, the target application would need to
analyze and generate logical structure based on the underlying content
that is available through PDFNet.
Note that PDF standard supports marked content and so called 'tagged
PDF'. PDFNet can be used to extract marked content and any existing
logical structure. Unfortunately many PDF files are missing tags
and
logical structure.
[ Back to top ]
The most likely cause for this behavior is that missing Elements
are annotation objects. Annotation are not part of the content stream.
Although it is a bad practice, some PDF generators produce PDF content
in the form of annotations.
With PDFNet library it is possible to read the appearances of existing
annotations in the same way as reading the page content. To process
annotation appearances, first obtain annotation array from the Page
and initialize ElementReader with annotation's appearance stream
(/AP dictionary entry). You can then extract annotation's Elements
in the same way as when reading page content.
[ Back to top ]
You can embed/associate JavaScript with any type of PDF annotation
or with the PDF document using SDF/Cos API. For example, the following
code snippet creates additional action dictionary and associates
it with an existing annotation:
// Create a JavaScript 'Additional Action'
// (see section 8.4.1 'Annotation Dictionaries', 8.5 'Actions',
// and 'JavaScript Actions' on page 668 in PDF Reference Manual
// for details).
Obj js_action = doc.CreateIndirectDict();
js_action.PutName("S", "JavaScript");
js_action.PutString("JS", "alert('Hello World');");
Obj aa_dict = my_annot.GetSDFObj().PutDict("AA");
aa_dict.Put("F", js_action);
---
CosEdit utility can be very
useful while you work with SDF/Cos API.
Here is another example of adding JavaScript as a document level
action:
Obj root = pdfdoc.GetRoot();
Obj aa = root.PutDict("AA");
Obj ds = pdfdoc.CreateIndirectDict();
Obj ws = pdfdoc.CreateIndirectDict();
Obj dc = pdfdoc.CreateIndirectDict();
aa.Put("DS", ds); // Did Save Action
aa.Put("WS", ws); // Will Save Action
aa.Put("DC", dc); // Document Close Action
ds.PutName("S", "JavaScript");
ds.PutString("JS", "... DidSave JavaScript ....");
ws.PutName("S", "JavaScript");
ws.PutString("JS", "... OnSave JavaScript ....");
dc.PutName("S", "JavaScript");
dc.PutString("JS", "...OnClose JavaScript ....");
For lengthy JavaScript code segments you can also embed JavaScript
as SDF streams objects instead of strings. For example:
StdFile embed_file = new StdFile("code.js", StdFile.OpenMode.e_read_mode);
FilterReader mystm = new FilterReader(embed_file);
Obj js_stream = doc.CreateIndirectStream(mystm));
...
js_action.Put("JS", js_stream);
[ Back to top ]
You can embed PostScript stream in PDF as follows (C# sample):
// Embed a custom stream (file postscript.ps).
StdFile embed_file = new StdFile("postscript.ps",
StdFile.OpenMode.e_read_mode);
FilterReader mystm = new FilterReader(embed_file);
Obj ps_stm = doc.CreateIndirectStream(mystm);
ps_stm.PutName("Subtype", "PS"));
// ...
// Then use ElementBuilder and ElementWriter to
// reference the PostScript stream from a given page:
Element element = builder.CreateForm(ps_stm);
writer.WriteElement(element);
[ Back to top ]
To create a root bookmark in documents that don't have any bookmarks/outlines
use PDFDoc.AddRootBookmark(mybookmark). To insert a new root bookmark
before the existing bookmark, use mybookmark.AddPrev( "Upper
Sibling" ). To insert a new root bookmark after the existing
bookmark, use mybookmark.AddPrev( "Lower Sibling" ).
[ Back to top ]
The following is a short sample code that illustrates how to split
a
document based on bookmarks: PDFBookmarkSplit.cs.
You may want to use PDFBookmarkSplit as a starting point for your
project or for further customizations to the splitting process.
[ Back to top ]
Customers using .NET version of PDFNet and working with large documents
can dramatically increase the performance by saving a file to a
temporary file instead to a memory buffer. The real performance
bottleneck is related to .NET data-marshaling and not PDF merging.
Merging performance can also be increased by merging original documents
instead of copying all pages to a new document. Instead of copying
all pages to a new document you can simply append or delete pages
in the source document. Note that PDFDoc.Save(...) is not altering
the original document unless the filename matches the original filename.
Another optimization tip is to use PDFDoc.ImportPages() to efficiently
copy a page set from one document to another. See Copying/Merging
Pages for details.
[ Back to top ]
Q: I am using PDFDoc.PagePushBack() (or PagePushBack/PageInsert)
method to combine two or more PDF-s into one. The problem is that
the file size of the resulting PDF is too big compared to size of
input PDF documents.
A: If you encounter this problem please refer
to Copying/Merging
Pages section in PDFNet User Manual. The file size can be dramatically
reduced by importing page set in the target document using PDFDoc.ImportPages()
and then using PDFDoc.PagePushBack() (or PagePushBack/PageInsert)
to position the page within document's page sequence.
[ Back to top ]
Q: Is it possible to merge two pages stored in
two separate PDF files, i.e. a data file and a background file into
one file with the text overlayed on the image?
A: Using PDFNet toolkit it is very simple to merge
content from several pages into one.
The first step is to import the overly page into the background
document. You can then merge page content in two ways.
A) You can read Elements from the overly page using ElementReader
and write them using ElementWriter on the background page.
B) You can create Form XObject Element out of the overly page using
ElementBuilder and write it on the background page using ElementWriter.
Technique A is illustrated in the following pseudo-code:
PDFDoc over = new PDFDoc("overly.pdf");
PDFDoc back = new PDFDoc("background.pdf");
// Import the overly page into the background doc
PageIterator op_itr = over.PageFind(1);
back.PagePushBack(op_itr.Current());
// Background page
Page bp = back.GetPage(1);
// Overly page
Page op = back.GetPage(2);
// Copy Elements from Overly page to
// Background page ElementReader
reader = new ElementReader();
reader.Begin(op);
ElementWriter writer = new ElementWriter();
writer.Begin(bp);
Element element;
while ((element = reader.Next()) != null)
writer.WriteElement(element);
writer.End();
reader.End();
// You can now optionaly remove the overly page
// back.PageRemove(back.PageFind(2));
back.Save("merged.pdf", 0);
The above code-snippet
assumes that the overly is the first page in "overly.pdf"
and that "background.pdf" has a single page. It is trivial
to extend the sample to an arbitrary case.
[ Back to top ]
Page imposition is a process of combining pages onto larger sheets
to make books, booklets, pamphlets, etc.
Page imposition can be used to arrange/order pages prior to printing
or to assemble a 'master' page from several 'source' pages. Using
PDFNet API it is possible to write applications that can re-order
the pages such that they will display in the correct order when
the hard copy pages are compiled and folded correctly.
For an example on how multiple pages can be combined/imposed using
PDFNet please take a look at ImpositionTest
sample project.
[ Back to top ]
In PDF, Field's value is separate from its annotation (i.e. how
the field appears on the page). After you modify Field's value you
need to refresh Field's appearance as follows:
field.SetValue("My value");
// Regenerate appearance stream.
field.RefreshAppearance();
Alternatively, you can delete "AP"
entry from the Widget annotation and set "NeedAppearances"
flag in AcroForm dictionary:
doc.GetAcroForm()
.PutBool("NeedAppearances", true);
This will force viewer application to auto-generate
new field appearances every time the document is opened.
Yet another option is to generate a custom annotation appearance
using ElementBuilder and ElementWriter and then set the "AP"
entry in the widget dictionary to the new appearance stream. This
functionality is useful in applications that need very advanced
control over 'look and feel' of the document.
[ Back to top ]
Form 'flattening' refers to the operation that changes active
form fields into a static area that is part of the PDF document,
just like the other text and images in the document. A completely
flattened PDF form does not have any widget annotations or interactive
fields.
Using Field.Flatten() or Page.FlattenField() method it is possible
to merge individual field appearances with the page content. PDFNet
also allows you to flatten all forms in the document in a single
function call (PDFDoc.FlattenFields()).
Note that it is not possible to undo Field.Flatten() operation.
An alternative approach to set the field as read only, that can
be programmatically reversed, is using Field.SetFlag(Field::e_read_only,
true) method.
[ Back to top ]
You can use the following code snippet to remove all JavaScript
from the document:
FieldIterator itr = doc.GetFieldIterator();
for( ; itr.HasNext(); itr.Next()) {
Obj dict = itr.Current().GetSDFObj();
dict.Erase("A");
dict.Erase("AA");
}
[ Back to top ]
SDF (Structured Document Format) and COS (Carousel Object System;
Carousel was a codename for Acrobat 1.0) are synonyms for PDF low-level
object model. SDF is the acronym used in PDFNet, whereas COS is
used in Acrobat SDK.
In many ways, SDF is to PDF what XML is to SVG (Scalable Vector
Graphics). Cos object system provides the low-level object types
and file structure used in PDF files. PDF documents are graphs of
Cos objects. Cos objects can represent document components such
as bookmarks, pages, fonts, and annotations, etc.
PDF is not the only document format built on top of SDF/Cos. FDF
(Form Data Format) and PJTF (Portable Job Ticket Format) are also
built on top of Cos.
The SDF/Cos layer deals directly with the data that is in a PDF
(or Cos based) file. The data types are referred to as SDF/Cos Objects.
There are eight data types found in PDF files. They are arrays,
dictionaries, numbers, Boolean values, names, strings, streams,
and a null object. In order to retrieve or modify PDF (or other
Cos based) content, you need to understand these objects. You can
create new objects and delete or modify existing objects.
For a detailed description of Cos layer refer to the Chapter 3
(Syntax) of PDF
Reference Manual.
[ Back to top ]
It is possible to use PDFNet printing functionality in both client
and server applications.
For an example of client integration, please take a look at PDFView
sample project (PrintPage method).
If you are interested in server-side printing, please take a look
at PDFPrint sample project.
PDFPrint sample does not require any user intervention and can automatically
print on the default printer.
[ Back to top ]
Simply use pdfdoc.RemoveSecurity(), than save the document using
pdfdoc.Save(...).
[ Back to top ] |