pyhanko.pdf_utils.rw_common module

Utilities common to reading and writing PDF files.

class pyhanko.pdf_utils.rw_common.PdfHandler

Bases: object

Abstract class providing a general interface for quering objects in PDF readers and writers alike.

get_object(ref: Reference, as_metadata_stream: bool = False)

Retrieve the object associated with the provided reference from this PDF handler.

Parameters
  • ref – An instance of generic.Reference.

  • as_metadata_stream – Whether to dereference the object as an XMP metadata stream.

Returns

A PDF object.

property trailer_view: DictionaryObject

Returns a view of the document trailer of the document represented by this PdfHandler instance.

The view is effectively read-only, in the sense that any writes will not be reflected in the actual trailer (if the handler supports writing, that is).

Returns

A generic.DictionaryObject representing the current state of the document trailer.

property document_meta_view: DocumentMetadata
property root_ref: Reference
Returns

A reference to the document catalog of this PDF handler.

property root: DictionaryObject
Returns

The document catalog of this PDF handler.

property document_id: Tuple[bytes, bytes]
find_page_container(page_ix)

Retrieve the node in the page tree containing the page with index page_ix, along with the necessary objects to modify it in an incremental update scenario.

Parameters

page_ix – The (zero-indexed) number of the page for which we want to retrieve the parent. A negative number counts pages from the back of the document, with index -1 referring to the last page.

Returns

A triple with the /Pages object (or a reference to it), the index of the target page in said /Pages object, and a (possibly inherited) resource dictionary.

find_page_for_modification(page_ix)

Retrieve the page with index page_ix from the page tree, along with the necessary objects to modify it in an incremental update scenario.

Parameters

page_ix – The (zero-indexed) number of the page to retrieve. A negative number counts pages from the back of the document, with index -1 referring to the last page.

Returns

A tuple with a reference to the page object and a (possibly inherited) resource dictionary.