Validating PDF signatures

Basic use

Validating signatures in a PDF file is done through the validate subcommand of pyhanko sign.

A simple use case might look like this:

pyhanko sign validate --pretty-print document.pdf

This will print a human-readable overview of the validity status of the signatures in document.pdf. The trust setup can be configured using the same command-line parameters and configuration options as for creating LTV signatures.

Factors in play when validating a signature

In this subsection, we go over the various factors considered by pyHanko when evaluating the validity of a PDF signature.

Cryptographic integrity

The most fundamental aspect of any digital signature: verify that the bytes of the file covered by the signature produce the correct hash value, and that the signature object is a valid signature of that hash. By ‘valid’, we mean that the cryptographic signature should be verifiable using the public key in the certificate that is marked as the signer’s in the signature object. In other words, we need to check that the purported signer’s certificate actually produced the signature.

Authenticity: trust settings

Having verified that the signature was produced by the (claimed) signer’s certificate, we next have to validate the binding between the certificate and its owner. That is to say, we have to convince ourselves that the entity whose name is on the certificate is in control of the private key, i.e. that the signer is who they claim to be.

Technically, this is done by establishing a chain of trust to a trust anchor, which we rely on to judge the validity of cryptographic identity claims. This is where the trust settings mentioned above come into play.

Incremental updates: difference analysis

PDF files can be modified, even when signed, by appending data to the end of the previous revision. These are incremental updates. In particular, this is how forms with multiple signatures are implemented in PDF. These incremental updates can essentially modify the original document in arbitrary ways, which is a problem, since they are (by definition) not covered by any earlier signatures.

In short, validators have two options: either reject all incremental updates (and decline to support multiple-signer scenarios of any kind), or police incremental updates by itself. The exact way in which this is supposed to be done is not specified precisely in the PDF standard.

Warning

PyHanko attempts to run a difference analysis on incremental updates, and processes modifications on a reject-by-default basis (i.e. all updates that can’t be vetted as OK are considered suspect). However, this feature is (very) experimental, and shouldn’t be relied on too much.

Establishing the time of signing

There are a number of ways to indicate when a signature was made. These broadly fall into two categories:

  • Self-reported timestamps: those are based on the signer’s word, and shouldn’t necessarily be trusted as accurate.

  • Trusted timestamps: these derive from timestamp tokens issued by a trusted timestamping authority at the time of signing.

Especially in the context of long-term verifiability of signatures and preventing things like backdating of documents, having an accurate measure of when the timestamp was made can be of crucial importance. PyHanko will tell you when a signature includes a timestamp token, and validate it along with the signature.

Evaluating seed value constraints

Finally, the document author can put certain restrictions on future signatures when setting up the form fields. These are known as seed values in the PDF standard. Not all seed values represent constraints (some are intended as suggestions), but one especially useful use of them is to earmark signature fields for use by specific signers. When validating signatures, pyHanko will also report on whether (mandatory) seed value constraints were respected.

Warning

Not all digital signing software is capable of processing seed values, so some false positives are to be expected.

Obviously, seed value constraints are only truly reliable if the document author secures the document with a certification signature before sending it for signing. Otherwise, later signers can modify the seed values before putting their signatures in place. See here for other concerns to keep in mind when relying on seed values.

Warning

PyHanko currently does not offer validation of structural PAdES profile requirements, in the sense that it can’t tell you if a signature complies with all the provisions required by a particular PAdES profile. Note that these are requirements on the signature itself, and have no bearing on possible later modifications to the document.