pyhanko.pdf_utils.writer module

Utilities for writing PDF files. Contains code from the PyPDF2 project; see here for the original license.

class pyhanko.pdf_utils.writer.ObjectStream(compress=True)

Bases: object

Utility class to collect objects into a PDF object stream.

Object streams are mainly useful for space efficiency reasons. They allow related objects to be grouped & compressed together in a more flexible manner.

Warning

Object streams can only be used in files with a cross-reference stream, as opposed to a classical XRef table. In particular, this means that incremental updates to files with a legacy XRef table cannot contain object streams either. See § 7.5.7 in ISO 32000-1 for further details.

Warning

The usefulness of object streams is somewhat stymied by the fact that PDF stream objects cannot be embedded into object streams for syntactical reasons.

add_object(idnum: int, obj: pyhanko.pdf_utils.generic.PdfObject)

Add an object to an object stream. Note that objects in object streams always have their generation number set to 0 by definition.

Parameters
  • idnum – The object’s ID number.

  • obj – The object to embed into the object stream.

Raises

TypeError – Raised if obj is an instance of StreamObject or IndirectObject.

as_pdf_object()pyhanko.pdf_utils.generic.StreamObject

Render the object stream to a PDF stream object

Returns

An instance of StreamObject.

class pyhanko.pdf_utils.writer.BasePdfFileWriter(root, info, document_id, obj_id_start=0, stream_xrefs=True)

Bases: pyhanko.pdf_utils.rw_common.PdfHandler

Base class for PDF writers.

output_version = (1, 7)

Output version to be declared in the output file.

stream_xrefs: bool

Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.

The default for new files is True. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).

set_info(info: Optional[Union[pyhanko.pdf_utils.generic.IndirectObject, pyhanko.pdf_utils.generic.DictionaryObject]])

Set the /Info entry of the document trailer.

Parameters

info – The new /Info dictionary, either as an indirect reference or as a DictionaryObject

mark_update(obj_ref: Union[pyhanko.pdf_utils.generic.Reference, pyhanko.pdf_utils.generic.IndirectObject])

Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.

Parameters

obj_ref – An indirect object instance or a reference.

update_container(obj: pyhanko.pdf_utils.generic.PdfObject)

Mark the container of an object (as indicated by the container_ref attribute on PdfObject) for an update.

As with mark_update(), this only applies to incremental updates, but defaults to a no-op.

Parameters

obj – The object whose top-level container needs to be rewritten.

property root_ref
Returns

A reference to the document catalog of this PDF handler.

get_object(ido)

Retrieve the object associated with the provided reference from this PDF handler.

Parameters

ref – An instance of generic.Reference.

Returns

A PDF object.

allocate_placeholder()pyhanko.pdf_utils.generic.IndirectObject

Allocate an object reference to populate later. Calls to get_object() for this reference will return NullObject until it is populated using add_object().

This method is only relevant in certain advanced contexts where an object ID needs to be known before the object it refers to can be built; chances are you’ll never need it.

Returns

A IndirectObject instance referring to the object just allocated.

add_object(obj, obj_stream: Optional[pyhanko.pdf_utils.writer.ObjectStream] = None, idnum=None)pyhanko.pdf_utils.generic.IndirectObject

Add a new object to this writer.

Parameters
  • obj – The object to add.

  • obj_stream – An object stream to add the object to.

  • idnum – Manually specify the object ID of the object to be added. This is only allowed for object IDs that have previously been allocated using allocate_placeholder().

Returns

A IndirectObject instance referring to the object just added.

prepare_object_stream(compress=True)

Prepare and return a new ObjectStream object.

Parameters

compress – Indicates whether the resulting object stream should be compressed.

Returns

An ObjectStream object.

property trailer_view

Returns a view of the document trailer of the document represented by this PdfHandler instance.

The view is effectively read-only, in the sense that any writes will not be reflected in the actual trailer (if the handler supports writing, that is).

Returns

A generic.DictionaryObject representing the current state of the document trailer.

write(stream)

Write the contents of this PDF writer to a stream.

Parameters

stream – A writable output stream.

register_annotation(page_ref, annot_ref)

Register an annotation to be added to a page. This convenience function takes care of calling mark_update() where necessary.

Parameters
  • page_ref – Reference to the page object involved.

  • annot_ref – Reference to the annotation object to be added.

insert_page(new_page, after=None)

Insert a page object into the tree.

Parameters
  • new_page – Page object to insert.

  • after – Page number (zero-indexed) after which to insert the page.

Returns

A reference to the newly inserted page.

import_object(obj: pyhanko.pdf_utils.generic.PdfObject, obj_stream: Optional[pyhanko.pdf_utils.writer.ObjectStream] = None)pyhanko.pdf_utils.generic.PdfObject

Deep-copy an object into this writer, dealing with resolving indirect references in the process.

Parameters
  • obj – The object to import.

  • obj_stream

    The object stream to import objects into.

    Note

    Stream objects and bare references will not be put into the object stream; the standard forbids this.

Returns

The object as associated with this writer. If the input object was an indirect reference, a dictionary (incl. streams) or an array, the returned value will always be a new instance.

import_page_as_xobject(other: pyhanko.pdf_utils.rw_common.PdfHandler, page_ix=0, content_stream=0, inherit_filters=True)

Import a page content stream from some other PdfHandler into the current one as a form XObject.

Parameters
  • other – A PdfHandler

  • page_ix – Index of the page to copy (default: 0)

  • content_stream – Index of the page’s content stream to copy, if multiple are present (default: 0)

  • inherit_filters – Inherit the content stream’s filters, if present.

Returns

An IndirectObject referring to the page object as added to the current reader.

class pyhanko.pdf_utils.writer.PageObject(contents, media_box, resources=None)

Bases: pyhanko.pdf_utils.generic.DictionaryObject

Subclass of DictionaryObject that handles some of the initialisation boilerplate for page objects.

class pyhanko.pdf_utils.writer.PdfFileWriter(stream_xrefs=True)

Bases: pyhanko.pdf_utils.writer.BasePdfFileWriter

Class to write new PDF files.

stream_xrefs: bool

Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.

The default for new files is True. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).

object_streams: List[pyhanko.pdf_utils.writer.ObjectStream]
pyhanko.pdf_utils.writer.init_xobject_dictionary(command_stream: bytes, box_width, box_height, resources: Optional[pyhanko.pdf_utils.generic.DictionaryObject] = None)pyhanko.pdf_utils.generic.StreamObject

Helper function to initialise form XObject dictionaries.

Note

For utilities to handle image XObjects, see images.

Parameters
  • command_stream – The XObject’s raw appearance stream.

  • box_width – The width of the XObject’s bounding box.

  • box_height – The height of the XObject’s bounding box.

  • resources – A resource dictionary to include with the form object.

Returns

A StreamObject representation of the form XObject.