pyhanko.pdf_utils.metadata package

Submodules module DocumentMetadata, info: DictionaryObject, only_update_existing: bool = False) bool DictionaryObject, strict: bool = True) DocumentMetadata

pyhanko.pdf_utils.metadata.model module

Added in version 0.14.0.

This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.

class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, author: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, subject: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, keywords: ~typing.List[str] = <factory>, creator: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, created: str | ~datetime.datetime | None = None, last_modified: str | ~datetime.datetime | None = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)

Bases: object

Simple representation of document metadata. All entries are optional.

title: StringWithLanguage | str | None = None

The document’s title.

author: StringWithLanguage | str | None = None

The document’s author.

subject: StringWithLanguage | str | None = None

The document’s subject.

keywords: List[str]

Keywords associated with the document.

creator: StringWithLanguage | str | None = None

The software that was used to author the document.


This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.

created: str | datetime | None = None

The time when the document was created. To set it to the current time, specify now.

last_modified: str | datetime | None = 'now'

The time when the document was last modified. Defaults to the current time upon serialisation if not specified.

xmp_extra: List[XmpStructure]

Extra XMP metadata.

xmp_unmanaged: bool = False

Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than xmp_extra will be ignored when updating XMP metadata.


The last-modified date and producer entries in the info dictionary will still be updated.


DocumentMetadata represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.

view_over(base: DocumentMetadata)
pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.25.1.dev1'

pyHanko version identifier in textual form


A regular string, a string with a language code, or nothing at all.

alias of StringWithLanguage | str | None

class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)

Bases: object

An expanded XML name.

ns: str

The URI of the namespace in which the name resides.

local_name: str

The local part of the name.

class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])

Bases: object

XMP value qualifiers wrapper. Implements __getitem__. Note that xml:lang gets special treatment.


quals – The qualifiers to model.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) Qualifiers

Construct a Qualifiers object from a list of name-value pairs.


lst – A list of name-value pairs.


A Qualifiers object.

classmethod lang_as_qual(lang: str | None) Qualifiers

Construct a Qualifiers object that only wraps a language qualifier.


lang – A language code.


A Qualifiers object.

iter_quals(with_lang: bool = True) Iterable[Tuple[ExpandedName, XmpValue]]

Iterate over all qualifiers.


with_lang – Include the language qualifier.


property lang: str | None

Retrieve the language qualifier, if any.

property has_non_lang_quals: bool

Check if there are any non-language qualifiers.

class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~pyhanko.pdf_utils.metadata.model.XmpStructure | ~pyhanko.pdf_utils.metadata.model.XmpArray | ~pyhanko.pdf_utils.metadata.model.XmpUri | str, qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)

Bases: object

A general XMP value, potentially with qualifiers.

value: XmpStructure | XmpArray | XmpUri | str

The value.

qualifiers: Qualifiers

Qualifiers that apply to the value.

class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])

Bases: object

A generic XMP structure value. Implements __getitem__ for field access.


fields – The structure’s fields.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) XmpStructure

Construct an XmpStructure from a list of name-value pairs.


lst – A list of name-value pairs.


An an XmpStructure.

class pyhanko.pdf_utils.metadata.model.XmpArrayType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

XMP array types.


Ordered array.


Unordered array.


Alternative array.

as_rdf() ExpandedName

Render the type as an XML name.

class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])

Bases: object

An XMP array.

array_type: XmpArrayType

The type of the array.

entries: List[XmpValue]

The entries in the array.

classmethod ordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an ordered XMP array.


lst – An iterable of XMP values.


An ordered XmpArray.

classmethod unordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an unordered XMP array.


lst – An iterable of XMP values.


An unordered XmpArray.

classmethod alternative(lst: Iterable[XmpValue]) XmpArray

Convert a list to an alternative XMP array.


lst – An iterable of XMP values.


An alternative XmpArray.

pyhanko.pdf_utils.metadata.model.NS = {'dc': '', 'pdf': '', 'pdfaExtension': '', 'pdfaProperty': '', 'pdfaSchema': '', 'pdfaid': '', 'pdfuaid': '', 'rdf': '', 'x': 'adobe:ns:meta/', 'xml': '', 'xmp': ''}

Known namespaces and their customary prefixes.

pyhanko.pdf_utils.metadata.model.XML_LANG =

lang in the xml namespace.

pyhanko.pdf_utils.metadata.model.RDF_RDF =

RDF in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_SEQ =

Seq in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_BAG =

Bag in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ALT =

Alt in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_LI =

li in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_VALUE =

value in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_RESOURCE =

resource in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE =

parseType in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ABOUT =

about in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION =

Description in the rdf namespace.

pyhanko.pdf_utils.metadata.model.DC_TITLE =

title in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_CREATOR =

creator in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION =

description in the dc namespace.

pyhanko.pdf_utils.metadata.model.PDF_PRODUCER =

Producer in the pdf namespace.

pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS =

keywords in the pdf namespace.

pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta

xmpmeta in the x namespace.

pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk

xmptk in the x namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL =

CreatorTool in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE =

CreateDate in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_MODDATE =

ModifyDate in the xmp namespace.

pyhanko.pdf_utils.metadata.xmp_xml module

pyhanko.pdf_utils.metadata.xmp_xml.iter_attrs(elem: Element) Iterator[Tuple[ExpandedName, str]]
pyhanko.pdf_utils.metadata.xmp_xml.add_xmp_value(container: Element, value: XmpValue)
pyhanko.pdf_utils.metadata.xmp_xml.serialise_xmp(roots: List[XmpStructure], out: BinaryIO)
class pyhanko.pdf_utils.metadata.xmp_xml.MetadataStream(dict_data: dict | None = None, stream_data: bytes | None = None, encoded_data: bytes | None = None, handler: SecurityHandler | None = None)

Bases: StreamObject

classmethod from_xmp(xmp: List[XmpStructure]) MetadataStream
property xmp: List[XmpStructure]
update_xmp_with_meta(meta: DocumentMetadata)
pyhanko.pdf_utils.metadata.xmp_xml.update_xmp_with_meta(meta: DocumentMetadata, roots: Iterable[XmpStructure] = ())
pyhanko.pdf_utils.metadata.xmp_xml.meta_from_xmp(roots: List[XmpStructure])
exception pyhanko.pdf_utils.metadata.xmp_xml.XmpXmlProcessingError

Bases: ValueError

pyhanko.pdf_utils.metadata.xmp_xml.parse_xmp(inp: BinaryIO) List[XmpStructure]

Module contents