pyhanko.pdf_utils.incremental_writer module
Utility for writing incremental updates to existing PDF files.
- class pyhanko.pdf_utils.incremental_writer.IncrementalPdfFileWriter(input_stream, prev: Optional[PdfFileReader] = None, strict=True)
Bases:
BasePdfFileWriter
Class to incrementally update existing files.
This
BasePdfFileWriter
subclass encapsulates aPdfFileReader
instance in addition to exposing an interface to add and modify PDF objects.Incremental updates to a PDF file append modifications to the end of the file. This is critical when the original file contents are not to be modified directly (e.g. when it contains digital signatures). It has the additional advantage of providing an automatic audit trail of sorts.
- Parameters
input_stream – Input stream to read current revision from.
strict – Ingest the source file in strict mode. The default is
True
.prev – Explicitly pass in a PDF reader. This parameter is internal API.
- IO_CHUNK_SIZE = 4096
- classmethod from_reader(reader: PdfFileReader) IncrementalPdfFileWriter
Instantiate an incremental writer from a PDF file reader.
- Parameters
reader – A
PdfFileReader
object with a PDF to extend.
- ensure_output_version(version)
- get_object(ido, as_metadata_stream: bool = False)
Retrieve the object associated with the provided reference from this PDF handler.
- Parameters
ref – An instance of
generic.Reference
.as_metadata_stream – Whether to dereference the object as an XMP metadata stream.
- Returns
A PDF object.
- mark_update(obj_ref: Union[Reference, IndirectObject])
Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.
- Parameters
obj_ref – An indirect object instance or a reference.
- update_container(obj: PdfObject)
Mark the container of an object (as indicated by the
container_ref
attribute onPdfObject
) for an update.As with
mark_update()
, this only applies to incremental updates, but defaults to a no-op.- Parameters
obj – The object whose top-level container needs to be rewritten.
- update_root()
Signal that the document catalog should be written to the output. Equivalent to calling
mark_update()
withroot_ref
.
- set_info(info: Optional[Union[IndirectObject, DictionaryObject]])
Set the
/Info
entry of the document trailer.- Parameters
info – The new
/Info
dictionary, either as an indirect reference or as aDictionaryObject
- set_custom_trailer_entry(key: NameObject, value: PdfObject)
Set a custom, unmanaged entry in the document trailer or cross-reference stream dictionary.
Warning
Calling this method to set an entry that is managed by pyHanko internally (info dictionary, document catalog, etc.) has undefined results.
- Parameters
key – Dictionary key to use in the trailer.
value – Value to set
- write(stream)
Write the contents of this PDF writer to a stream.
- Parameters
stream – A writable output stream.
- property document_meta_view: DocumentMetadata
- write_in_place()
Write the updated file contents in-place to the same stream as the input stream. This obviously requires a stream supporting both reading and writing operations.
- encrypt(user_pwd)
Method to handle updates to encrypted files.
This method handles decrypting of the original file, and makes sure the resulting updated file is encrypted in a compatible way. The standard mandates that updates to encrypted files be effected using the same encryption settings. In particular, incremental updates cannot remove file encryption.
- Parameters
user_pwd – The original file’s user password.
- Raises
PdfReadError – Raised when there is a problem decrypting the file.
- encrypt_pubkey(credential: EnvelopeKeyDecrypter)
Method to handle updates to files encrypted using public-key encryption.
The same caveats as
encrypt()
apply here.- Parameters
credential – The
EnvelopeKeyDecrypter
handling the recipient’s private key.- Raises
PdfReadError – Raised when there is a problem decrypting the file.
- stream_xrefs: bool
Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.
The default for new files is
True
. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).