pyhanko.pdf_utils.metadata.model module

New in version 0.14.0.

This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.

class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, author: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, subject: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, keywords: ~typing.List[str] = <factory>, creator: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, created: ~typing.Optional[~typing.Union[str, ~datetime.datetime]] = None, last_modified: ~typing.Optional[~typing.Union[str, ~datetime.datetime]] = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)

Bases: object

Simple representation of document metadata. All entries are optional.

title: Optional[Union[StringWithLanguage, str]] = None

The document’s title.

author: Optional[Union[StringWithLanguage, str]] = None

The document’s author.

subject: Optional[Union[StringWithLanguage, str]] = None

The document’s subject.

keywords: List[str]

Keywords associated with the document.

creator: Optional[Union[StringWithLanguage, str]] = None

The software that was used to author the document.


This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.

created: Optional[Union[str, datetime]] = None

The time when the document was created. To set it to the current time, specify now.

last_modified: Optional[Union[str, datetime]] = 'now'

The time when the document was last modified. Defaults to the current time upon serialisation if not specified.

xmp_extra: List[XmpStructure]

Extra XMP metadata.

xmp_unmanaged: bool = False

Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than xmp_extra will be ignored when updating XMP metadata.


The last-modified date and producer entries in the info dictionary will still be updated.


DocumentMetadata represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.

view_over(base: DocumentMetadata)
pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.14.0'

pyHanko version identifier in textual form


A regular string, a string with a language code, or nothing at all.

alias of Optional[Union[StringWithLanguage, str]]

class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)

Bases: object

An expanded XML name.

ns: str

The URI of the namespace in which the name resides.

local_name: str

The local part of the name.

class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])

Bases: object

XMP value qualifiers wrapper. Implements __getitem__. Note that xml:lang gets special treatment.


quals – The qualifiers to model.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) Qualifiers

Construct a Qualifiers object from a list of name-value pairs.


lst – A list of name-value pairs.


A Qualifiers object.

classmethod lang_as_qual(lang: Optional[str]) Qualifiers

Construct a Qualifiers object that only wraps a language qualifier.


lang – A language code.


A Qualifiers object.

iter_quals(with_lang: bool = True) Iterable[Tuple[ExpandedName, XmpValue]]

Iterate over all qualifiers.


with_lang – Include the language qualifier.


property lang: Optional[str]

Retrieve the language qualifier, if any.

property has_non_lang_quals: bool

Check if there are any non-language qualifiers.

class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~typing.Union[~pyhanko.pdf_utils.metadata.model.XmpStructure, ~pyhanko.pdf_utils.metadata.model.XmpArray, ~pyhanko.pdf_utils.metadata.model.XmpUri, str], qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)

Bases: object

A general XMP value, potentially with qualifiers.

value: Union[XmpStructure, XmpArray, XmpUri, str]

The value.

qualifiers: Qualifiers

Qualifiers that apply to the value.

class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])

Bases: object

A generic XMP structure value. Implements __getitem__ for field access.


fields – The structure’s fields.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) XmpStructure

Construct an XmpStructure from a list of name-value pairs.


lst – A list of name-value pairs.


An an XmpStructure.

class pyhanko.pdf_utils.metadata.model.XmpArrayType(value)

Bases: Enum

XMP array types.


Ordered array.


Unordered array.


Alternative array.

as_rdf() ExpandedName

Render the type as an XML name.

class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])

Bases: object

An XMP array.

array_type: XmpArrayType

The type of the array.

entries: List[XmpValue]

The entries in the array.

classmethod ordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an ordered XMP array.


lst – An iterable of XMP values.


An ordered XmpArray.

classmethod unordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an unordered XMP array.


lst – An iterable of XMP values.


An unordered XmpArray.

classmethod alternative(lst: Iterable[XmpValue]) XmpArray

Convert a list to an alternative XMP array.


lst – An iterable of XMP values.


An alternative XmpArray.

pyhanko.pdf_utils.metadata.model.NS = {'dc': '', 'pdf': '', 'pdfaExtension': '', 'pdfaProperty': '', 'pdfaSchema': '', 'pdfaid': '', 'pdfuaid': '', 'rdf': '', 'x': 'adobe:ns:meta/', 'xml': '', 'xmp': ''}

Known namespaces and their customary prefixes.

pyhanko.pdf_utils.metadata.model.XML_LANG =

lang in the xml namespace.

pyhanko.pdf_utils.metadata.model.RDF_RDF =

RDF in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_SEQ =

Seq in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_BAG =

Bag in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ALT =

Alt in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_LI =

li in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_VALUE =

value in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_RESOURCE =

resource in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE =

parseType in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ABOUT =

about in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION =

Description in the rdf namespace.

pyhanko.pdf_utils.metadata.model.DC_TITLE =

title in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_CREATOR =

creator in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION =

description in the dc namespace.

pyhanko.pdf_utils.metadata.model.PDF_PRODUCER =

Producer in the pdf namespace.

pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS =

keywords in the pdf namespace.

pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta

xmpmeta in the x namespace.

pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk

xmptk in the x namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL =

CreatorTool in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE =

CreateDate in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_MODDATE =

ModifyDate in the xmp namespace.