pyhanko.pdf_utils.writer module¶
Utilities for writing PDF files. Contains code from the PyPDF2 project; see here for the original license.
-
class
pyhanko.pdf_utils.writer.
ObjectStream
(compress=True)¶ Bases:
object
Utility class to collect objects into a PDF object stream.
Object streams are mainly useful for space efficiency reasons. They allow related objects to be grouped & compressed together in a more flexible manner.
Warning
Object streams can only be used in files with a cross-reference stream, as opposed to a classical XRef table. In particular, this means that incremental updates to files with a legacy XRef table cannot contain object streams either. See § 7.5.7 in ISO 32000-1 for further details.
Warning
The usefulness of object streams is somewhat stymied by the fact that PDF stream objects cannot be embedded into object streams for syntactical reasons.
-
add_object
(idnum: int, obj: pyhanko.pdf_utils.generic.PdfObject)¶ Add an object to an object stream. Note that objects in object streams always have their generation number set to 0 by definition.
- Parameters
idnum – The object’s ID number.
obj – The object to embed into the object stream.
- Raises
TypeError – Raised if
obj
is an instance ofStreamObject
orIndirectObject
.
-
as_pdf_object
() → pyhanko.pdf_utils.generic.StreamObject¶ Render the object stream to a PDF stream object
- Returns
An instance of
StreamObject
.
-
-
class
pyhanko.pdf_utils.writer.
BasePdfFileWriter
(root, info, document_id, obj_id_start=0, stream_xrefs=True)¶ Bases:
pyhanko.pdf_utils.rw_common.PdfHandler
Base class for PDF writers.
-
output_version
= (1, 7)¶ Output version to be declared in the output file.
-
stream_xrefs
: bool¶ Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.
The default for new files is
True
. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).
-
set_info
(info: Optional[Union[pyhanko.pdf_utils.generic.IndirectObject, pyhanko.pdf_utils.generic.DictionaryObject]])¶ Set the
/Info
entry of the document trailer.- Parameters
info – The new
/Info
dictionary, either as an indirect reference or as aDictionaryObject
-
mark_update
(obj_ref: Union[pyhanko.pdf_utils.generic.Reference, pyhanko.pdf_utils.generic.IndirectObject])¶ Mark an object reference to be updated. This is only relevant for incremental updates, but is included as a no-op by default for interoperability reasons.
- Parameters
obj_ref – An indirect object instance or a reference.
-
update_container
(obj: pyhanko.pdf_utils.generic.PdfObject)¶ Mark the container of an object (as indicated by the
container_ref
attribute onPdfObject
) for an update.As with
mark_update()
, this only applies to incremental updates, but defaults to a no-op.- Parameters
obj – The object whose top-level container needs to be rewritten.
-
property
root_ref
¶ - Returns
A reference to the document catalog of this PDF handler.
-
get_object
(ido)¶ Retrieve the object associated with the provided reference from this PDF handler.
- Parameters
ref – An instance of
generic.Reference
.- Returns
A PDF object.
-
allocate_placeholder
() → pyhanko.pdf_utils.generic.IndirectObject¶ Allocate an object reference to populate later. Calls to
get_object()
for this reference will returnNullObject
until it is populated usingadd_object()
.This method is only relevant in certain advanced contexts where an object ID needs to be known before the object it refers to can be built; chances are you’ll never need it.
- Returns
A
IndirectObject
instance referring to the object just allocated.
-
add_object
(obj, obj_stream: Optional[pyhanko.pdf_utils.writer.ObjectStream] = None, idnum=None) → pyhanko.pdf_utils.generic.IndirectObject¶ Add a new object to this writer.
- Parameters
obj – The object to add.
obj_stream – An object stream to add the object to.
idnum – Manually specify the object ID of the object to be added. This is only allowed for object IDs that have previously been allocated using
allocate_placeholder()
.
- Returns
A
IndirectObject
instance referring to the object just added.
-
prepare_object_stream
(compress=True)¶ Prepare and return a new
ObjectStream
object.- Parameters
compress – Indicates whether the resulting object stream should be compressed.
- Returns
An
ObjectStream
object.
-
property
trailer_view
¶ Returns a view of the document trailer of the document represented by this
PdfHandler
instance.The view is effectively read-only, in the sense that any writes will not be reflected in the actual trailer (if the handler supports writing, that is).
- Returns
A
generic.DictionaryObject
representing the current state of the document trailer.
-
write
(stream)¶ Write the contents of this PDF writer to a stream.
- Parameters
stream – A writable output stream.
-
register_annotation
(page_ref, annot_ref)¶ Register an annotation to be added to a page. This convenience function takes care of calling
mark_update()
where necessary.- Parameters
page_ref – Reference to the page object involved.
annot_ref – Reference to the annotation object to be added.
-
insert_page
(new_page, after=None)¶ Insert a page object into the tree.
- Parameters
new_page – Page object to insert.
after – Page number (zero-indexed) after which to insert the page.
- Returns
A reference to the newly inserted page.
-
import_object
(obj: pyhanko.pdf_utils.generic.PdfObject, obj_stream: Optional[pyhanko.pdf_utils.writer.ObjectStream] = None) → pyhanko.pdf_utils.generic.PdfObject¶ Deep-copy an object into this writer, dealing with resolving indirect references in the process.
- Parameters
obj – The object to import.
obj_stream –
The object stream to import objects into.
Note
Stream objects and bare references will not be put into the object stream; the standard forbids this.
- Returns
The object as associated with this writer. If the input object was an indirect reference, a dictionary (incl. streams) or an array, the returned value will always be a new instance.
-
import_page_as_xobject
(other: pyhanko.pdf_utils.rw_common.PdfHandler, page_ix=0, content_stream=0, inherit_filters=True)¶ Import a page content stream from some other
PdfHandler
into the current one as a form XObject.- Parameters
other – A
PdfHandler
page_ix – Index of the page to copy (default: 0)
content_stream – Index of the page’s content stream to copy, if multiple are present (default: 0)
inherit_filters – Inherit the content stream’s filters, if present.
- Returns
An
IndirectObject
referring to the page object as added to the current reader.
-
-
class
pyhanko.pdf_utils.writer.
PageObject
(contents, media_box, resources=None)¶ Bases:
pyhanko.pdf_utils.generic.DictionaryObject
Subclass of
DictionaryObject
that handles some of the initialisation boilerplate for page objects.
-
class
pyhanko.pdf_utils.writer.
PdfFileWriter
(stream_xrefs=True)¶ Bases:
pyhanko.pdf_utils.writer.BasePdfFileWriter
Class to write new PDF files.
-
stream_xrefs
: bool¶ Boolean controlling whether or not the output file will contain its cross-references in stream format, or as a classical XRef table.
The default for new files is
True
. For incremental updates, the writer adapts to the system used in the previous iteration of the document (as mandated by the standard).
-
object_streams
: List[pyhanko.pdf_utils.writer.ObjectStream]¶
-
-
pyhanko.pdf_utils.writer.
init_xobject_dictionary
(command_stream: bytes, box_width, box_height, resources: Optional[pyhanko.pdf_utils.generic.DictionaryObject] = None) → pyhanko.pdf_utils.generic.StreamObject¶ Helper function to initialise form XObject dictionaries.
Note
For utilities to handle image XObjects, see
images
.- Parameters
command_stream – The XObject’s raw appearance stream.
box_width – The width of the XObject’s bounding box.
box_height – The height of the XObject’s bounding box.
resources – A resource dictionary to include with the form object.
- Returns
A
StreamObject
representation of the form XObject.