pyhanko.pdf_utils.incremental_writer module

Utility for writing incremental updates to existing PDF files.

class pyhanko.pdf_utils.incremental_writer.IncrementalPdfFileWriter(input_stream, prev: Optional[pyhanko.pdf_utils.reader.PdfFileReader] = None, strict=True)

Bases: pyhanko.pdf_utils.writer.BasePdfFileWriter

Class to incrementally update existing files.

This BasePdfFileWriter subclass encapsulates a PdfFileReader instance in addition to exposing an interface to add and modify PDF objects.

Incremental updates to a PDF file append modifications to the end of the file. This is critical when the original file contents are not to be modified directly (e.g. when it contains digital signatures). It has the additional advantage of providing an automatic audit trail of sorts.

Parameters
  • input_stream – Input stream to read current revision from.

  • strict – Ingest the source file in strict mode. The default is True.

  • prev – Explicitly pass in a PDF reader. This parameter is internal API.

IO_CHUNK_SIZE = 4096
classmethod from_reader(reader: pyhanko.pdf_utils.reader.PdfFileReader) pyhanko.pdf_utils.incremental_writer.IncrementalPdfFileWriter

Instantiate an incremental writer from a PDF file reader.

Parameters

reader – A PdfFileReader object with a PDF to extend.

ensure_output_version(version)
get_object(ido)

Retrieve the object associated with the provided reference from this PDF handler.

Parameters

ref – An instance of generic.Reference.

Returns

A PDF object.

mark_update(obj_ref: Union[pyhanko.pdf_utils.generic.Reference, pyhanko.pdf_utils.generic.IndirectObject])

Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.

Parameters

obj_ref – An indirect object instance or a reference.

update_container(obj: pyhanko.pdf_utils.generic.PdfObject)

Mark the container of an object (as indicated by the container_ref attribute on PdfObject) for an update.

As with mark_update(), this only applies to incremental updates, but defaults to a no-op.

Parameters

obj – The object whose top-level container needs to be rewritten.

update_root()

Signal that the document catalog should be written to the output. Equivalent to calling mark_update() with root_ref.

set_info(info: Optional[Union[pyhanko.pdf_utils.generic.IndirectObject, pyhanko.pdf_utils.generic.DictionaryObject]])

Set the /Info entry of the document trailer.

Parameters

info – The new /Info dictionary, either as an indirect reference or as a DictionaryObject

set_custom_trailer_entry(key: pyhanko.pdf_utils.generic.NameObject, value: pyhanko.pdf_utils.generic.PdfObject)

Set a custom, unmanaged entry in the document trailer or cross-reference stream dictionary.

Warning

Calling this method to set an entry that is managed by pyHanko internally (info dictionary, document catalog, etc.) has undefined results.

Parameters
  • key – Dictionary key to use in the trailer.

  • value – Value to set

write(stream)

Write the contents of this PDF writer to a stream.

Parameters

stream – A writable output stream.

write_updated_section(stream)

Only write the updated and new objects to the designated output stream.

The new PDF file can then be put together by concatenating the original input with the generated output.

Parameters

stream – Output stream to write to.

write_in_place()

Write the updated file contents in-place to the same stream as the input stream. This obviously requires a stream supporting both reading and writing operations.

encrypt(user_pwd)

Method to handle updates to encrypted files.

This method handles decrypting of the original file, and makes sure the resulting updated file is encrypted in a compatible way. The standard mandates that updates to encrypted files be effected using the same encryption settings. In particular, incremental updates cannot remove file encryption.

Parameters

user_pwd – The original file’s user password.

Raises

PdfReadError – Raised when there is a problem decrypting the file.

encrypt_pubkey(credential: pyhanko.pdf_utils.crypt.pubkey.EnvelopeKeyDecrypter)

Method to handle updates to files encrypted using public-key encryption.

The same caveats as encrypt() apply here.

Parameters

credential – The EnvelopeKeyDecrypter handling the recipient’s private key.

Raises

PdfReadError – Raised when there is a problem decrypting the file.

stream_xrefs: bool

Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.

The default for new files is True. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).