Part 4 in a series for product managers looking to accelerate their time to market when adding new document & digital content capabilities

Governments around the world are speaking out in droves about cybersecurity in the wake of recent and high-profile events. They are now urging big companies to 'raise the bar.' In response, big companies in the spotlight have ramped up cybersecurity spending, but shortages of key resources needed for due diligence have only been exacerbated by the pandemic and its aftereffects.

This and data privacy present a significant challenge if you intend to capture or add features to mission-critical workflows in products serving large enterprises and/or highly regulated industries. Whether you’re a small company or a large organization, the seemingly low-cost option isn’t always the quickest, for example, if you have to find or use scarce talent to ensure added components aren’t your weakest link.

Pre-built document and digital content components are no exception when it comes to security.

So, in this chapter, the fourth in a series for product teams, we explore today’s biggest security & compliance risks with and within your documents. We also look at ways you can protect your organization, customers, and their sensitive data in your documents when adding viewing, editing, and collaboration capabilities.

Security Considerations in 2021

As a commercial digital content & document processing SDK provider, we work with hundreds of companies and governments globally involved in some of the most demanding, high-pressure environments you can imagine. These include hospital emergency rooms, law enforcement, VDR for negotiating billion-dollar deals, finance, clinical trials processes, and more.

This chapter was inspired by their experiences, implementing the highest levels of security and negotiating reams of red tape.

We prioritized some of the most common risks we see these companies facing when adding pre-built components for viewing, editing, and collaborating on content.

Today, some of their top security concerns include:

  1. Low native security in PDFs
  2. Server-side caching of sensitive data
  3. Malicious payloads in documents, and attacks on the document processing server
  4. Data privacy, data processing, and geographical compliance
  5. The so-called "analog loophole"

And here are some strategies for how you can square your needs for security with those of user experience and cost-efficiency. We summarize these in-depth at the end of this chapter.

  • Do not let users download their PDFs if document retention and control over what users do with their documents is important
  • Move processes client-side into the browser where security layers are already in place and data isn’t being transferred back and forth to a server
  • Select battle-hardened components that are secure-by-design and quality assured, so you’re not having to maintain components external to your main product offering

Low Native Security in PDF Files

First, there is a lot to love about PDF. For one, it’s in our company’s namesake. Second, it serves as a simple and cost-effective medium for visual exchange -- and thus as a staple for many business and collaboration workflows as a sort of 'digital paper'.

At the same time, PDF is sophisticated under the hood, offering a rich array of graphics, objects, and features with which to develop solutions. However, its built-in features for security are not so strong. These include master password protection and PDF document permissions, which teams may be tempted to rely on for security.

From a technical standpoint, they’re more like a bike lock: sure, it’ll stop casual theft, but a motivated person with the right tools will have little problem cracking the lock and riding off with it.

Master PDF password protection, for example, is vulnerable to brute-forcing. Likewise, PDF document permissions (on text selection, copy/paste, etc.) are voluntary according to the PDF standard; i.e., developers building applications are free to ignore them. So if the user decides to open your "protected" PDF document in another application, they could perform actions you want blocked, such as manipulating the content, copy/paste, etc.

PDF digital signing features used for ensuring authenticity of documents like contracts are more robust. However, it could still be vulnerable if the software for managing and verifying signatures is not designed and maintained the right way, leading to potential breaches of authenticity and trust.

Several of these weaknesses in PDF digital signatures were uncovered recently by researchers at Germany’s Ruhr University Bochum and the Münster University of Applied Sciences.

For a deeper dive on these and more recently discovered PDF exploits -- you can check out their website. (But grab a cup of coffee first.)

We’ve also written about their research before:

Cited in their report is a previous 2019 analysis of 22 commercial applications. This found 21 vulnerable to one or more of three severe weaknesses in PDF-based digital signature validation published at https://pdf-insecurity.org/.

Server-side Caching of Sensitive Data

Server-side caching of document information is an area of concern for data privacy.

Many web-based viewers, especially those on mobile, and some mobile viewers, actually render server-side due to a weaker client viewer engine and may cache rendered images of sensitive data on the server so pages can be instantly fetched if requested again later.

If pre-rendered pages are stored, even on an image server you control, that will require precautions such as DevOps involvement, as admins or devs can get eyes on private personal and/or personally identifiable information they should not see.

For example:

Canopy Tax is a tax software platform, offering the tools of practice management, tax resolution, tax preparation, and more -- all in one place. SOC-2 compliant, Canopy Tax requires all private tax information on its platform to be viewed only by its owners and their authorized trustees. In 2019, they wanted to upgrade their document engine, and were evaluating different components. One they evaluated but did not go with, as it could not meet their requirements for compliance cost-effectively due to how it cached rendered information:

"Every time a document opens, it’s also saved in their server-side cache. It becomes a document ID. And if you log into the admin area, you can see all of those documents and access them.”

~ Canopy Tax's Sr. Product Manager Malcolm Felt

Malicious Payloads in Documents and Attacks on the Document Processing Server

Another consideration is the security of document software components, especially when users are expected to upload arbitrary files. Documents serve as a well-known attack vector: A PDF file, for example, can be used to host a malicious payload, to exfiltrate other data out of your system, or hijack aspects. (Risks include running a malicious script from a PDF file, which is disabled by default in our PDF SDKs for that reason.)

For document-based attacks, hackers will typically target weaknesses in document libraries run in a server environment and not protected within the browser sandbox by process isolation. Usually, vulnerabilities stem from memory issues. This is the most commonly reported type of vulnerability in PDFium, for example, an open-source C++ PDF library often used to power an image server to serve content as images down to client applications.

In addition, in the case of the record-breaking Equifax breach and Heartbleed (a memory buffer overflow exploit), both featured open-source libraries. Open-source components offer hackers particularly rich targets because, with relatively little investment, they can target the same component used across many applications at the same time.

They will also use public vulnerability reporting databases to identify weaknesses. These can then be leveraged against yet-to-be-patched applications using outdated open-source code (which are not hard to find).

...since vulnerability detection and exploitation has become a professional business, it is and always will be likely that attacks will occur even before we fully disclose the attack vectors, by reverse engineering the code that fixes the vulnerability in question or by scanning for yet unknown vulnerabilities.

~Apache Struts Sept. 2017 Statement on Equifax breach

Reliance on SaaS/Server Tech for Document Processing

Next, those for whom data privacy is a concern also need to consider in whose hands documents with sensitive data will be processed, and possibly, through what jurisdictions that data will pass.

A consideration here is that many vital document processing actions have been difficult to do directly in a web application. These actions include:

  • redaction
  • document generation (from Office templates)
  • page manipulation
  • conversion
  • watermarking
  • and so on

As a result, organizations typically rely on servers for document processing along with data processing. This adds complexity from a compliance standpoint where documents are processed by a third-party service (possibly in another country).

An alternative is to bring document processing onto your own servers or AWS-hosted applications. This has clear advantages in terms of:

  1. Control over your data and due diligence time-savings: For example you no longer have to vet a third-party data processor under GDPR, or complete extra paperwork for sharing of ePHI.

If you can perform such actions locally in the client web application -- all the better. You gain additional benefits, especially when processing such data as personally identifiable information.

  1. Simplified geographical compliance: You no longer need to consider where data processing is performed, since it’s done locally.

  2. Removal of redundant server infrastructure: You also don’t need to build out the security layers and monitor different services for each jurisdiction, thus simplifying setup for new sites in different countries and eliminating redundant server costs.

  3. No server-side caching of sensitive information: Next, since documents are processed and rendered client-side for interaction, no personal, private data is left around in a non-encrypted format for admins to see.

  4. Fewer steps in the document workflow: With data processing performed in the same component as for viewing, users will not have to exit the application, or open extra tabs, to complete a task.

  5. More secure: Also, by moving processes into the security layers provided by the browser, you remove server-based attack surfaces for hackers to exploit.

  6. Shorter time to market: Lastly, with fewer moving parts, and less data moving to the server and back, you’ll get simplified integration of new features and therefore a shorter time to market!

The Analog Loophole and How to Cover It

Lastly, we discuss the so-called analog hole, a concern for those with the most demanding requirements to control documents. The term originally comes from the introduction of DRM in the 1990s; if someone wanted to steal a song on disc -- merely put a microphone to the speaker. Want to bootleg that new Blu-ray? Film the screen with a camcorder.

Likewise, in the case of documents rasterized on screen, there’s little to stop users from taking a screenshot of highly sensitive or private information -- or just snapping a photo with their smartphone

One document feature used to cover the so-called analog loophole, deployed by our customers such as human layer security service provider Egress -- is watermarking pages. Watermarking lets you burn an image of your company logo into rasterized content, and, perhaps, additional information to identify the user and session (their IP address, a timestamp, etc.).

This method, while not bulletproof, makes stealing sensitive content a bit harder, and monitoring documents easier, say, if users need to share content and you want to help safeguard intellectual property.

You may also want to perform watermarking client-side to simplify security and compliance, reduce server costs, make users more productive, and leverage the numerous other benefits of client-side document processing.

Summary of Key Takeaways

In summary, companies adding document and digital content capabilities to their apps face several challenges tied to their document security and compliance needs. And rather than relying on just a bike lock, they prefer to have a bike room and locker for securing their sensitive data and mission-critical applications.

There are a few proven tactics that we support and have seen work in combination to square competing requirements.

Strategy 1 - Do Not Download Your PDFs

If PDF security is not strong, you should not rely on it if:

  • You want to enforce document retention
  • You want to control what users do with the file

Therefore, do not download your PDFs. Or more precisely, do not let your users download their PDFs.

Instead, you will want components that support options for streaming content as encrypted binary, and that support local storage in a secure cache (useful for mobile and offline use).

You will also obviously want components with a customizable UI that lets you remove the download button, as well as customizable user permissions to control what actions each user performs in that UI.

2 - Minimize the Server for Data Processing

Another strategy touched on repeatedly in this piece is to minimize reliance on the server -- which offers double benefits from a compliance & security point of view.

Running document processing client-side was previously difficult to do without sacrifices in the user experience due to the limitations of the browser.

In the last few years, however, innovations in web technology have changed the game. For example, WebAssembly is allowing more and more developers to port native code and thus native capabilities into the browser safely, where features can leverage a revamped security model, to run processes in a memory-safe virtual sandbox.

The security model of WebAssembly has two important goals: (1) protect users from buggy or malicious modules, and (2) provide developers with useful primitives and mitigations for developing safe applications, within the constraints of (1).

~ webassembly.org

WebAssembly is the technology enabling our flagship product WebViewer (now in its 8th iteration). WebViewer lets you convert MS Office and image files to PDF, redact, manipulate pages, watermark, and a lot more -- all in the browser client, without any third-party server dependencies or calls out, and with native performance, stability, and accuracy.

You can check out all of these document processing capabilities once thought exclusive to web services and desktop tools in your browser today via our WebViewer showcase demo.

Here is what some of our customers say about leveraging this client-side technology:

All the clinical trial data processing now occurs in the user’s browser and in the client itself. So all the security layers are guaranteed; you don’t have to develop all-new ones and monitor those separately in a hosted service.

~ Anonymous at a Life Sciences Industry Software Provider

When we decided to reverse course and go with PDFTron, our DevOps team was like, 'Okay, well, I guess we don't have to do anything.' They could just wash their hands of it and move on to other tasks as our front-end team implemented the client-side rendering.

~ Canopy Tax’s Sr. Product Manager Malcolm Felt

Other products support redaction. But they require going to another website and re-uploading the PDF or leaving the browser to open a different app on the desktop. The advantage of using PDFTron is that the user doesn't have to leave our application, making the customer workflow and integration of the SDK much simpler.

~ Egress’s Chief Product Officer Sudeep Venkatesh

3 - Select Secure-by-Design and Battle-Hardened Components

Simply delegating processes client-side will not ensure security if the components themselves are not designed with security in mind and well-maintained (i.e., tested and updated regularly). Moreover, some processes you simply can’t delegate wholly to the client (e.g., CAD-to-PDF conversion, real-time collaboration, etc.) either because the client application lacks the required muscle to do everything smoothly, or because you need a server for coordination.

As a result, you will want components that are robust and battle-hardened, subject to best practices in engineering and quality assurance from the ground up, across all platforms.

For example, some strategies we use to bring security to our SDK components across all platforms include:

  • Regular code reviews by certified and reputed, third-party code-auditing firms
  • Use of static code analysis tools (Veracode, etc.)
  • A bug bounty program and rewards
  • Proactive notification of impacted customers should a weakness ever be discovered
  • A more rigorous, secure-by-design approach to all existing & new features internally -- for example, reflected in our new secure-by-default digital signature validation. This was determined not vulnerable to any of the latest exploits published at www.pdf-insecurity.org.

Wrap up

In this chapter, we discussed some of the biggest security and compliance considerations you may face, and strategies to square the time-to-market, security, and user experience triangle used by our customers around the world for their mission-critical workflows and in highly regulated industries that demand the highest in security.

In the next chapter in our time-to-market series, we tackle questions related to licensing a solution, when consolidating vendors is right for you, and how to negotiate a win-win technology partnership. Stay tuned to this space for more!