Some test text!

How to extract XML from a XFA PDF form?keyboard_arrow_down

How to extract XML from a XFA PDF form?

Suppose the XML data you want to extract is held inside the XFA Array, within the AcroForm dictionary. In order to extract all of the XFA data, you will need to iterate through this Array, and extract all of the content streams.

The following example shows how to extract the XML data at one specific index in the Array:

// Example code for extracting an xml string from the XFA form,
// and putting it back after an update.

PDFDoc doc = new PDFDoc(filename); 

//get the acroform dictionary
Obj acro_form = doc.GetAcroForm(); 

// This PDF document contains XFA forms... 
Obj obj = acro_form.FindObj("XFA"); 

//We will store the XML string in this byte array
byte[] buff = new byte[4000];
byte byteRawPre, byteDecodePre, byteRawPost, byteDecodePost; 

pdftron.Filters.Filter filter; 
pdftron.Filters.FilterReader fr; 

//The XFA entry in the PDF is an Array, so in this case,
//we want to read the xml string stored at the 5th index of the Array
filter = obj.GetAt(5).GetDecodedStream(); 
fr = new pdftron.Filters.FilterReader(filter); 
//at this point, the xml string should be stored inside buff,
//and you can make whatever modifications you want

//Modify XML String HERE

//We create an indirect stream object, which will contain our 
//  newly modified XML string
Obj new_xmp_stm = doc.CreateIndirectStream(buff);

//The swap method allows us to switch all indirect references to the old stream,
//  to point to our newly created stream.

doc.Save(output_filename, SDFDoc.SaveOptions.e_linearized); 

Get the answers you need: Support


Free Trial

Get unlimited trial usage of PDFTron SDK to bring accurate, reliable, and fast document processing capabilities to any application or workflow.

Select a platform to get started with your free trial.

Unlimited usage. No email address required.

Join our upcoming webinar to learn about how to collaborate on videos frame by frame directly in your browser

Save your seat