PDF.js: Build vs Buy

By Adam Pez | 2019 Aug 30

9 min

Survey: Why Organizations Switch from PDF.js

Copied to clipboard

To test this hypothesis, we surveyed 57 unique organizations who came to us after trying PDF.js and finding it could not meet their needs. Many of these organizations consisted of OEMs and enterprises from industries such as construction and engineering, publishing, finance, legal, education, and life sciences.

These organizations ultimately deemed using a PDF.js library as unacceptable for one of the following reasons:

Graph of reasons why developer teams switched from PDF.js

Notably, of the 42 organizations who wanted more functionality, 71.4% tried to implement that functionality themselves first with PDF.js — and found it too difficult or time-intensive.

Graph of functionality teams attempt on PDF.js

The Costs and Risks of Customization

Copied to clipboard

If you are thinking about a PDF.js demo or customization, you may wish to consider the following:

The time it takes to learn and build new functionality
The ongoing costs of supporting and maintaining custom features
Bugs and other open issues that may require your attention
Your requirements for accuracy, reliability, and speed
The risk of delays or abandoning a project

Why Adding Features is Time-intensive

Copied to clipboard

The challenge of a PDF.js-based customization is that PDF.js was intended as a Mozilla PDF reader and Firefox’s integrated PDF viewer, as Dropbox wrote after abandoning a PDF.js-based project:

Integrating PDF.js with Dropbox was quite difficult, if not downright hacky. PDF.js was designed to be Firefox’s integrated PDF viewer, rather than a component of another product.
– Senior Developer, Dropbox

Due to it being an open-source project not intended for use in other products, PDF.js lacks the conveniences of a commercial PDF SDK that would streamline development.

Out-of-box functionality is limited to viewing capabilities. You will be required to build additional functionality in-house or by using another open-source project, of varying quality and completeness.

PDF.js does not have an API for adding functionality to the UI. Thus adding certain features like annotations — attempted by 42% of 57 surveyed organizations — will prove challenging and time-intensive. You will need to familiarize yourself with the PDF.js code base, itself complex and assuming familiarity with the PDF specification.

The PDF specification is very complex:

PDF is an incredibly complex file format — the specification is more than a thousand pages long, not including the extensions and supplements.
– Senior Developer, Dropbox

PDFs are an incredibly complex file format; this is especially so given that a PDF can be generated in a hundred different ways, all of which a renderer needs to handle gracefully.
– Developer, Linkedin

Achieving familiarity with the PDF specification will entail acquiring specialized knowledge and expertise, which will take time. When adding annotations to PDF.js, for example, you may need a PDF.js demo to learn how to handle basic rendering instructions, including how to convert PDF annotation coordinates to <canvas> coordinates:

All of this translation is required every time the Annotation moves, whether the movement is caused by the user drawing the annotation, scrolling/resizing the document, etc.
– Senior Developer. Dropbox

Documentation is often absent, stale (many broken links) or incomplete because PDF.js is maintained by volunteers working without consistent oversight or quality control.

Support may be unreliable. You will have to rely on voluntary forum responses, and depending on the complexity of your request, answers may be slow or inadequate. If your request falls outside the scope of the project, you are largely on your own.

Certain features may need to be rebuilt — like PDF.js text-select and text-search, which may not deliver the desired UX out-of-the-box.

We are developing a document viewer app that provides a secure container and syncs the documents for offline reading. We evaluated PDF.js, but the UX was not the best.
– Senior UX Consultant, Fortune 50 Software Company

All told, your team may spend months to learn, build, calibrate, and optimize new and existing features.

Why Maintaining and Supporting Features is Time-intensive

Copied to clipboard

Additionally, you will have to invest time into feature support and maintenance. Since PDF.js is an open-source project, it cannot guarantee code stability and backward compatibility.

It is important to bear in mind that PDF.js, unlike a commercial SDK, is under an Apache License 2.0 — without warranties or liabilities for defects or regressions should either be introduced by a community contributor.

With over 6,000 forks of PDF.js, commits happen on average several times a week, and these changes are not necessarily performed with your project in mind:

PDF.js community member requesting support

You may find that community fixes lead to undesired rendering behavior or removal of certain features:

In some cases, PDF.js updates would break any custom-built functionality on top of PDF.js. Some of our customers had to dedicate additional staff to monitoring and testing changes. This made it harder for them to implement changes later on and reduced their capacity to build new features.
– Andrey Safonov, Apryse Solutions Engineer

Additionally, the PDF.js GitHub currently has 600+ open issues and has seen a noticeable decline in the issue resolution rate.

Many open issues stem from difficult-to-fix aspects of PDF.js such as issues related to the core rendering as well as text parsing engine, responsible for defining the text overlay used for text select, text extraction, and text search. (PDF.js text-select alone has 90+ open issues, more than any other single issue category.)

You may be required to own these issues as well to satisfy your users’ performance, rendering accuracy, and feature requirements.

Other Challenges with Using a JavaScript PDF Editor Library

Copied to clipboard

After investing months or years into a highly customized JavaScript PDF viewer, you may ultimately find that PDF.js is unable to meet your document performance, reliability, and rendering accuracy requirements.

Next to difficulty building functionality, almost half (45.6%) of 57 surveyed organizations cited either performance, reliability, or rendering accuracy as their primary reason for switching from PDF.js.

Here are just a few of these customers’ testimonials:

Performance

We also tried PDF.js to render pdf using a blob object. It is working on iPad and iPhones with a few limitations like it is not able to open PDFs bigger than 100MB, and it doesn’t support pinch zoom.
– Developer, Fortune 50 Company

...we have a custom PDF viewer, which decrypts the PDFs on the client side and renders them as SVGs using Mozilla’s PDF.js library. But the library is slow, inefficient, and requires the client to handle the rendering.
– Developer, eLearning Software

Customers are complaining about performance (mainly time to first page render). We want to have the same experience across all platforms for our two main use cases...
– Solution Architect, Life Sciences Software

While the document viewer works well and provides zoom, pan, annotation, outline and thumbnail navigation, it is slow since it requires the entire document to be downloaded before it can be viewed. We are looking for something better.
–Technical Director, Document Management Software

Reliability

We are using PDF.js now as an embedded viewer for PDF documents in a single page application, and we are having some issues with crashing browsers and suspect issues with the viewer.
– CTO, Training & Compliance Software

At present, we’re working with open-source PDF.js which is great for the 95% of PDFs, but the other 5% is critical. Larger PDFs are tricky.
– Co-founder, eDiscovery Software

Rendering Accuracy

We are currently using [PDF.js] to view construction plans related to a project being bid on. We have a small percentage of plans that don’t render correctly. In these cases we have a work around for the user to download the plan to Acrobat.
– VP, Software Consulting Firm

We have about 1000 paid users now. PDF.js has some problems: 1) Some weird formatting, such as with really old PDFs in a school database. 2) When the PDF is huge or full of images, for example, textbooks, it loads really slow. Also, it consumes a lot of RAM.
– Developer, eLearning Software

Our drawback with PDF.js is the loss of quality on some large plans on 100% zoom level and beyond. This loss of quality can sometimes block the user’s ability to make correct measurements in the file.
– Developer, 3D Mapping Software

When to Use PDF.js

Copied to clipboard

PDF.js affords a few advantages as a simple, short-term solution:

The project layers (core PDF parsing and rendering, the display API, and the example PDF viewer) are nicely separated. Installation is a breeze if one wants to use the example viewer layer or implement a custom JavaScript PDF viewer with limited functionality. Most of PDF.js’s dependencies rely on universal web standards. And basic UI elements, such as buttons, can be restyled quickly via the project CSS and HTML files. PDF.js may, therefore, prove cost-effective in the following situations:

Users are willing to tolerate some rendering, performance, and reliability issues
Where your project scope is limited to basic viewing
The project is a short-term solution
The PDFs are small and simple

Case Study: Slack — Where PDF.js Worked

Three years ago, Slack embedded a PDF.js viewer using only the resources of a single recent hire.

The organization was able to trim the viewer UI to an easy-to-maintain minimum of features and achieved basic viewing primarily for small PDFs (e.g., invoices, contracts, and sales reports).

They then blogged their success:

PDFs are complex documents — structured into different layers of information, data, and objects, and containing different languages, images, and graphics… PDF.js provided basic capabilities, including security and reliability, and helped us abstract away the complexities of the project. For our first pass at inline PDF viewing, we intentionally kept our scope narrow: display and text selection support for small PDF files.
– Senior Developer, Slack

However, despite first hinting at further iterations, after the passage of three years, Slack hasn’t added more features to its PDF.js viewer such as annotations, form filling, or signatures — features that would let users do more with their PDFs in Slack.

Instead, there are now third-party tools in the Slack App Directory that offer a few PDF capabilities, such as form fill. And these tools require a separate purchase or subscription.

Where PDF.js Fails

Copied to clipboard

Some of our customers testify that adding more features to PDF.js such as annotations, form filling, and signatures may prove very challenging and time-intensive.

Currently I'm evaluating possible solutions to replace PDF.js in a DMS application. We would like to move away from PDF.js because it’s limited in its functionality and we need some advanced stuff like annotations which can’t be easily done with it.
– Senior Developer, DMS Software

PDF.js is great for getting a proof of concept out there. It does 95% of the things we want, but that 5% is crucial to us.
– Developer, Legal Software

Therefore, a PDF.js-based project may not prove cost-effective if any of the following are true:

The web viewer will be heavily relied upon in an organizational setting or commercial product
Feature requirements are more advanced
Users will expect functionality to grow over time
The UX needs to be a competitive differentiator
Users expect rendering accuracy
Documents include the large and complex

The Bottom Line

Copied to clipboard

...you shouldn’t build anything that’s available off the shelf because it’s not a source of competitive advantage if everybody else can avail themselves of it. The only scenario where you should build is if it’s your core technology — the core source of your competitive differentiation and competitive advantage.
– Mark Holst-Knudsen, President ThomasNet @ MIT’s 2014 CIO Symposium

What the question of build vs. buy comes down to in a majority of cases is whether the total costs of an in-house build (time spent learning, building, maintaining, and supporting custom features) has a lesser impact on your bottom line than the price of a commercial SDK license.

With these considerations in mind, PDF.js may prove good enough where you need a fast, short-term solution for web viewing small and simple PDFs. In contrast, PDF.js may not be as dependable, flexible, or scalable as required if your web viewer or PDF.js editor will be heavily relied upon in a commercial product or organizational setting; where your feature requirements are more advanced; and where performance, reliability, and rendering accuracy are important.

However, if you require faster performance, reliability, and near-flawless rendering, as well as easy access to hundreds of unique features cross-platform, and accelerated time to market — then you may wish to consider a commercial solution such as Apryse SDK.

We’d love to hear any feedback you may have about this article or our PDF SDK. Don’t hesitate to contact us directly.