pyhanko.pdf_utils.incremental_writer module¶
Utility for writing incremental updates to existing PDF files.
-
class
pyhanko.pdf_utils.incremental_writer.
IncrementalPdfFileWriter
(input_stream)¶ Bases:
pyhanko.pdf_utils.writer.BasePdfFileWriter
Class to incrementally update existing files.
This
BasePdfFileWriter
subclass encapsulates aPdfFileReader
instance in addition to exposing an interface to add and modify PDF objects.Incremental updates to a PDF file append modifications to the end of the file. This is critical when the original file contents are not to be modified directly (e.g. when it contains digital signatures). It has the additional advantage of providing an automatic audit trail of sorts.
-
get_object
(ido)¶ Retrieve the object associated with the provided reference from this PDF handler.
- Parameters
ref – An instance of
generic.Reference
.- Returns
A PDF object.
-
mark_update
(obj_ref: Union[pyhanko.pdf_utils.generic.Reference, pyhanko.pdf_utils.generic.IndirectObject])¶ Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.
- Parameters
obj_ref – An indirect object instance or a reference.
-
update_container
(obj: pyhanko.pdf_utils.generic.PdfObject)¶ Mark the container of an object (as indicated by the
container_ref
attribute onPdfObject
) for an update.As with
mark_update()
, this only applies to incremental updates, but defaults to a no-op.- Parameters
obj – The object whose top-level container needs to be rewritten.
-
update_root
()¶
-
set_info
(info: Optional[Union[pyhanko.pdf_utils.generic.IndirectObject, pyhanko.pdf_utils.generic.DictionaryObject]])¶ Set the
/Info
entry of the document trailer.- Parameters
info – The new
/Info
dictionary, either as an indirect reference or as aDictionaryObject
-
write
(stream)¶ Write the contents of this PDF writer to a stream.
- Parameters
stream – A writable output stream.
-
write_updated_section
(stream)¶ Only write the updated and new objects to the designated output stream.
The new PDF file can then be put together by concatenating the original input with the generated output.
- Parameters
stream – Output stream to write to.
-
write_in_place
()¶ Write the updated file contents in-place to the same stream as the input stream. This obviously requires a stream supporting both reading and writing operations.
-
encrypt
(user_pwd)¶ Method to handle updates to encrypted files.
This method handles decrypting of the original file, and makes sure the resulting updated file is encrypted in a compatible way. The standard mandates that updates to encrypted files be effected using the same encryption settings. In particular, incremental updates cannot remove file encryption.
- Parameters
user_pwd – The original file’s user password.
- Raises
PdfReadError – Raised when there is a problem decrypting the file.
-
encrypt_pubkey
(credential: pyhanko.pdf_utils.crypt.EnvelopeKeyDecrypter)¶ Method to handle updates to files encrypted using public-key encryption.
The same caveats as
encrypt()
apply here.- Parameters
credential – The
EnvelopeKeyDecrypter
handling the recipient’s private key.- Raises
PdfReadError – Raised when there is a problem decrypting the file.
-
add_stream_to_page
(page_ix, stream_ref, resources=None, prepend=False)¶ Append an indirect stream object to a page in a PDF as a content stream.
- Parameters
page_ix – Index of the page to modify. The first page has index 0.
stream_ref –
IndirectObject
reference to the stream object to add.resources – Resource dictionary containing resources to add to the page’s existing resource dictionary.
prepend – Prepend the content stream to the list of content streams, as opposed to appending it to the end. This has the effect of causing the stream to be rendered underneath the already existing content on the page.
- Returns
An
IndirectObject
reference to the page object that was modified.
-
add_content_to_page
(page_ix, pdf_content: pyhanko.pdf_utils.content.PdfContent, prepend=False)¶ Convenience wrapper around
add_stream_to_page()
to turn aPdfContent
instance into a page content stream.- Parameters
page_ix – Index of the page to modify. The first page has index 0.
pdf_content – An instance of
PdfContent
prepend – Prepend the content stream to the list of content streams, as opposed to appending it to the end. This has the effect of causing the stream to be rendered underneath the already existing content on the page.
- Returns
An
IndirectObject
reference to the page object that was modified.
-
merge_resources
(orig_dict, new_dict) → bool¶ Update an existing resource dictionary object with data from another one. Returns
True
if the original dict object was modified directly.The caller is responsible for avoiding name conflicts with existing resources.
-
stream_xrefs
: bool¶ Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.
The default for new files is
True
. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).
-