pyhanko.pdf_utils.incremental_writer module¶
Utility for writing incremental updates to existing PDF files.
-
class
pyhanko.pdf_utils.incremental_writer.
IncrementalPdfFileWriter
(input_stream)¶ Bases:
pyhanko.pdf_utils.writer.BasePdfFileWriter
Class to incrementally update existing files.
This
BasePdfFileWriter
subclass encapsulates aPdfFileReader
instance in addition to exposing an interface to add and modify PDF objects.Incremental updates to a PDF file append modifications to the end of the file. This is critical when the original file contents are not to be modified directly (e.g. when it contains digital signatures). It has the additional advantage of providing an automatic audit trail of sorts.
-
get_object
(ido)¶ Retrieve the object associated with the provided reference from this PDF handler.
- Parameters
ref – An instance of
generic.Reference
.- Returns
A PDF object.
-
mark_update
(obj_ref: Union[pyhanko.pdf_utils.generic.Reference, pyhanko.pdf_utils.generic.IndirectObject])¶ Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.
- Parameters
obj_ref – An indirect object instance or a reference.
-
update_container
(obj: pyhanko.pdf_utils.generic.PdfObject)¶ Mark the container of an object (as indicated by the
container_ref
attribute onPdfObject
) for an update.As with
mark_update()
, this only applies to incremental updates, but defaults to a no-op.- Parameters
obj – The object whose top-level container needs to be rewritten.
-
update_root
()¶
-
write
(stream)¶ Write the contents of this PDF writer to a stream.
- Parameters
stream – A writable output stream.
-
write_in_place
()¶ Write the updated file contents in-place to the same stream as the input stream. This obviously requires a stream supporting both reading and writing operations.
-
encrypt
(user_pwd)¶ Method to handle updates to RC4-encrypted files.
This method handles decrypting of the original file, and makes sure the resulting updated file is encrypted in a compatible way. The standard mandates that updates to encrypted files be effected using the same encryption settings. In particular, incremental updates cannot remove file encryption.
Danger
One should also be aware that the encryption scheme implemented here is (very) weak, and we only support it for compatibility reasons. Under no circumstances should it still be used to encrypt new files.
- Parameters
user_pwd – The original file’s user password.
- Raises
PdfReadError – Raised when there is a problem decrypting the file.
-
add_stream_to_page
(page_ix, stream_ref, resources=None, prepend=False)¶ Append an indirect stream object to a page in a PDF as a content stream.
- Parameters
page_ix – Index of the page to modify. The first page has index 0.
stream_ref –
IndirectObject
reference to the stream object to add.resources – Resource dictionary containing resources to add to the page’s existing resource dictionary.
prepend – Prepend the content stream to the list of content streams, as opposed to appending it to the end. This has the effect of causing the stream to be rendered underneath the already existing content on the page.
- Returns
An
IndirectObject
reference to the page object that was modified.
-
add_content_to_page
(page_ix, pdf_content: pyhanko.pdf_utils.content.PdfContent, prepend=False)¶ Convenience wrapper around
add_stream_to_page()
to turn aPdfContent
instance into a page content stream.- Parameters
page_ix – Index of the page to modify. The first page has index 0.
pdf_content – An instance of
PdfContent
prepend – Prepend the content stream to the list of content streams, as opposed to appending it to the end. This has the effect of causing the stream to be rendered underneath the already existing content on the page.
- Returns
An
IndirectObject
reference to the page object that was modified.
-
merge_resources
(orig_dict, new_dict) → bool¶ Update an existing resource dictionary object with data from another one. Returns
True
if the original dict object was modified directly.The caller is responsible for avoiding name conflicts with existing resources.
-
stream_xrefs
: bool¶ Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.
The default for new files is
True
. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).
-