pyhanko.pdf_utils.metadata.model module

New in version 0.14.0.

This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.

class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, author: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, subject: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, keywords: ~typing.List[str] = <factory>, creator: ~typing.Optional[~typing.Union[~pyhanko.pdf_utils.misc.StringWithLanguage, str]] = None, created: ~typing.Optional[~typing.Union[str, ~datetime.datetime]] = None, last_modified: ~typing.Optional[~typing.Union[str, ~datetime.datetime]] = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)

Bases: object

Simple representation of document metadata. All entries are optional.

title: Optional[Union[StringWithLanguage, str]] = None

The document’s title.

author: Optional[Union[StringWithLanguage, str]] = None

The document’s author.

subject: Optional[Union[StringWithLanguage, str]] = None

The document’s subject.

keywords: List[str]

Keywords associated with the document.

creator: Optional[Union[StringWithLanguage, str]] = None

The software that was used to author the document.

Note

This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.

created: Optional[Union[str, datetime]] = None

The time when the document was created. To set it to the current time, specify now.

last_modified: Optional[Union[str, datetime]] = 'now'

The time when the document was last modified. Defaults to the current time upon serialisation if not specified.

xmp_extra: List[XmpStructure]

Extra XMP metadata.

xmp_unmanaged: bool = False

Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than xmp_extra will be ignored when updating XMP metadata.

Note

The last-modified date and producer entries in the info dictionary will still be updated.

Note

DocumentMetadata represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.

view_over(base: DocumentMetadata)
pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.14.0'

pyHanko version identifier in textual form

pyhanko.pdf_utils.metadata.model.MetaString

A regular string, a string with a language code, or nothing at all.

alias of Optional[Union[StringWithLanguage, str]]

class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)

Bases: object

An expanded XML name.

ns: str

The URI of the namespace in which the name resides.

local_name: str

The local part of the name.

class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])

Bases: object

XMP value qualifiers wrapper. Implements __getitem__. Note that xml:lang gets special treatment.

Parameters

quals – The qualifiers to model.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) Qualifiers

Construct a Qualifiers object from a list of name-value pairs.

Parameters

lst – A list of name-value pairs.

Returns

A Qualifiers object.

classmethod lang_as_qual(lang: Optional[str]) Qualifiers

Construct a Qualifiers object that only wraps a language qualifier.

Parameters

lang – A language code.

Returns

A Qualifiers object.

iter_quals(with_lang: bool = True) Iterable[Tuple[ExpandedName, XmpValue]]

Iterate over all qualifiers.

Parameters

with_lang – Include the language qualifier.

Returns

property lang: Optional[str]

Retrieve the language qualifier, if any.

property has_non_lang_quals: bool

Check if there are any non-language qualifiers.

class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~typing.Union[~pyhanko.pdf_utils.metadata.model.XmpStructure, ~pyhanko.pdf_utils.metadata.model.XmpArray, ~pyhanko.pdf_utils.metadata.model.XmpUri, str], qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)

Bases: object

A general XMP value, potentially with qualifiers.

value: Union[XmpStructure, XmpArray, XmpUri, str]

The value.

qualifiers: Qualifiers

Qualifiers that apply to the value.

class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])

Bases: object

A generic XMP structure value. Implements __getitem__ for field access.

Parameters

fields – The structure’s fields.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) XmpStructure

Construct an XmpStructure from a list of name-value pairs.

Parameters

lst – A list of name-value pairs.

Returns

An an XmpStructure.

class pyhanko.pdf_utils.metadata.model.XmpArrayType(value)

Bases: Enum

XMP array types.

ORDERED = 'Seq'

Ordered array.

UNORDERED = 'Bag'

Unordered array.

ALTERNATIVE = 'Alt'

Alternative array.

as_rdf() ExpandedName

Render the type as an XML name.

class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])

Bases: object

An XMP array.

array_type: XmpArrayType

The type of the array.

entries: List[XmpValue]

The entries in the array.

classmethod ordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an ordered XMP array.

Parameters

lst – An iterable of XMP values.

Returns

An ordered XmpArray.

classmethod unordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an unordered XMP array.

Parameters

lst – An iterable of XMP values.

Returns

An unordered XmpArray.

classmethod alternative(lst: Iterable[XmpValue]) XmpArray

Convert a list to an alternative XMP array.

Parameters

lst – An iterable of XMP values.

Returns

An alternative XmpArray.

pyhanko.pdf_utils.metadata.model.NS = {'dc': 'http://purl.org/dc/elements/1.1/', 'pdf': 'http://ns.adobe.com/pdf/1.3/', 'pdfaExtension': 'http://www.aiim.org/pdfa/ns/extension/', 'pdfaProperty': 'http://www.aiim.org/pdfa/ns/property#', 'pdfaSchema': 'http://www.aiim.org/pdfa/ns/schema#', 'pdfaid': 'http://www.aiim.org/pdfa/ns/id/', 'pdfuaid': 'http://www.aiim.org/pdfua/ns/id/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'x': 'adobe:ns:meta/', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xmp': 'http://ns.adobe.com/xap/1.0/'}

Known namespaces and their customary prefixes.

pyhanko.pdf_utils.metadata.model.XML_LANG = http://www.w3.org/XML/1998/namespace/lang

lang in the xml namespace.

pyhanko.pdf_utils.metadata.model.RDF_RDF = http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF

RDF in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_SEQ = http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq

Seq in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_BAG = http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag

Bag in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ALT = http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt

Alt in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_LI = http://www.w3.org/1999/02/22-rdf-syntax-ns#li

li in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_VALUE = http://www.w3.org/1999/02/22-rdf-syntax-ns#value

value in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_RESOURCE = http://www.w3.org/1999/02/22-rdf-syntax-ns#resource

resource in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE = http://www.w3.org/1999/02/22-rdf-syntax-ns#parseType

parseType in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ABOUT = http://www.w3.org/1999/02/22-rdf-syntax-ns#about

about in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION = http://www.w3.org/1999/02/22-rdf-syntax-ns#Description

Description in the rdf namespace.

pyhanko.pdf_utils.metadata.model.DC_TITLE = http://purl.org/dc/elements/1.1/title

title in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_CREATOR = http://purl.org/dc/elements/1.1/creator

creator in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION = http://purl.org/dc/elements/1.1/description

description in the dc namespace.

pyhanko.pdf_utils.metadata.model.PDF_PRODUCER = http://ns.adobe.com/pdf/1.3/Producer

Producer in the pdf namespace.

pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS = http://ns.adobe.com/pdf/1.3/keywords

keywords in the pdf namespace.

pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta

xmpmeta in the x namespace.

pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk

xmptk in the x namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL = http://ns.adobe.com/xap/1.0/CreatorTool

CreatorTool in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE = http://ns.adobe.com/xap/1.0/CreateDate

CreateDate in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_MODDATE = http://ns.adobe.com/xap/1.0/ModifyDate

ModifyDate in the xmp namespace.