pyhanko.pdf_utils.metadata package

Submodules

pyhanko.pdf_utils.metadata.info module

pyhanko.pdf_utils.metadata.info.update_info_dict(meta: DocumentMetadata, info: DictionaryObject, only_update_existing: bool = False) bool
pyhanko.pdf_utils.metadata.info.view_from_info_dict(info_dict: DictionaryObject, strict: bool = True) DocumentMetadata

pyhanko.pdf_utils.metadata.model module

Added in version 0.14.0.

This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.

class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, author: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, subject: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, keywords: ~typing.List[str] = <factory>, creator: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, created: str | ~datetime.datetime | None = None, last_modified: str | ~datetime.datetime | None = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)

Bases: object

Simple representation of document metadata. All entries are optional.

title: StringWithLanguage | str | None = None

The document’s title.

author: StringWithLanguage | str | None = None

The document’s author.

subject: StringWithLanguage | str | None = None

The document’s subject.

keywords: List[str]

Keywords associated with the document.

creator: StringWithLanguage | str | None = None

The software that was used to author the document.

Note

This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.

created: str | datetime | None = None

The time when the document was created. To set it to the current time, specify now.

last_modified: str | datetime | None = 'now'

The time when the document was last modified. Defaults to the current time upon serialisation if not specified.

xmp_extra: List[XmpStructure]

Extra XMP metadata.

xmp_unmanaged: bool = False

Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than xmp_extra will be ignored when updating XMP metadata.

Note

The last-modified date and producer entries in the info dictionary will still be updated.

Note

DocumentMetadata represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.

view_over(base: DocumentMetadata)
pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.25.1.dev1'

pyHanko version identifier in textual form

pyhanko.pdf_utils.metadata.model.MetaString

A regular string, a string with a language code, or nothing at all.

alias of StringWithLanguage | str | None

class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)

Bases: object

An expanded XML name.

ns: str

The URI of the namespace in which the name resides.

local_name: str

The local part of the name.

class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])

Bases: object

XMP value qualifiers wrapper. Implements __getitem__. Note that xml:lang gets special treatment.

Parameters:

quals – The qualifiers to model.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) Qualifiers

Construct a Qualifiers object from a list of name-value pairs.

Parameters:

lst – A list of name-value pairs.

Returns:

A Qualifiers object.

classmethod lang_as_qual(lang: str | None) Qualifiers

Construct a Qualifiers object that only wraps a language qualifier.

Parameters:

lang – A language code.

Returns:

A Qualifiers object.

iter_quals(with_lang: bool = True) Iterable[Tuple[ExpandedName, XmpValue]]

Iterate over all qualifiers.

Parameters:

with_lang – Include the language qualifier.

Returns:

property lang: str | None

Retrieve the language qualifier, if any.

property has_non_lang_quals: bool

Check if there are any non-language qualifiers.

class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~pyhanko.pdf_utils.metadata.model.XmpStructure | ~pyhanko.pdf_utils.metadata.model.XmpArray | ~pyhanko.pdf_utils.metadata.model.XmpUri | str, qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)

Bases: object

A general XMP value, potentially with qualifiers.

value: XmpStructure | XmpArray | XmpUri | str

The value.

qualifiers: Qualifiers

Qualifiers that apply to the value.

class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])

Bases: object

A generic XMP structure value. Implements __getitem__ for field access.

Parameters:

fields – The structure’s fields.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) XmpStructure

Construct an XmpStructure from a list of name-value pairs.

Parameters:

lst – A list of name-value pairs.

Returns:

An an XmpStructure.

class pyhanko.pdf_utils.metadata.model.XmpArrayType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

XMP array types.

ORDERED = 'Seq'

Ordered array.

UNORDERED = 'Bag'

Unordered array.

ALTERNATIVE = 'Alt'

Alternative array.

as_rdf() ExpandedName

Render the type as an XML name.

class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])

Bases: object

An XMP array.

array_type: XmpArrayType

The type of the array.

entries: List[XmpValue]

The entries in the array.

classmethod ordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an ordered XMP array.

Parameters:

lst – An iterable of XMP values.

Returns:

An ordered XmpArray.

classmethod unordered(lst: Iterable[XmpValue]) XmpArray

Convert a list to an unordered XMP array.

Parameters:

lst – An iterable of XMP values.

Returns:

An unordered XmpArray.

classmethod alternative(lst: Iterable[XmpValue]) XmpArray

Convert a list to an alternative XMP array.

Parameters:

lst – An iterable of XMP values.

Returns:

An alternative XmpArray.

pyhanko.pdf_utils.metadata.model.NS = {'dc': 'http://purl.org/dc/elements/1.1/', 'pdf': 'http://ns.adobe.com/pdf/1.3/', 'pdfaExtension': 'http://www.aiim.org/pdfa/ns/extension/', 'pdfaProperty': 'http://www.aiim.org/pdfa/ns/property#', 'pdfaSchema': 'http://www.aiim.org/pdfa/ns/schema#', 'pdfaid': 'http://www.aiim.org/pdfa/ns/id/', 'pdfuaid': 'http://www.aiim.org/pdfua/ns/id/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'x': 'adobe:ns:meta/', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xmp': 'http://ns.adobe.com/xap/1.0/'}

Known namespaces and their customary prefixes.

pyhanko.pdf_utils.metadata.model.XML_LANG = http://www.w3.org/XML/1998/namespace/lang

lang in the xml namespace.

pyhanko.pdf_utils.metadata.model.RDF_RDF = http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF

RDF in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_SEQ = http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq

Seq in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_BAG = http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag

Bag in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ALT = http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt

Alt in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_LI = http://www.w3.org/1999/02/22-rdf-syntax-ns#li

li in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_VALUE = http://www.w3.org/1999/02/22-rdf-syntax-ns#value

value in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_RESOURCE = http://www.w3.org/1999/02/22-rdf-syntax-ns#resource

resource in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE = http://www.w3.org/1999/02/22-rdf-syntax-ns#parseType

parseType in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ABOUT = http://www.w3.org/1999/02/22-rdf-syntax-ns#about

about in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION = http://www.w3.org/1999/02/22-rdf-syntax-ns#Description

Description in the rdf namespace.

pyhanko.pdf_utils.metadata.model.DC_TITLE = http://purl.org/dc/elements/1.1/title

title in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_CREATOR = http://purl.org/dc/elements/1.1/creator

creator in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION = http://purl.org/dc/elements/1.1/description

description in the dc namespace.

pyhanko.pdf_utils.metadata.model.PDF_PRODUCER = http://ns.adobe.com/pdf/1.3/Producer

Producer in the pdf namespace.

pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS = http://ns.adobe.com/pdf/1.3/keywords

keywords in the pdf namespace.

pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta

xmpmeta in the x namespace.

pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk

xmptk in the x namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL = http://ns.adobe.com/xap/1.0/CreatorTool

CreatorTool in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE = http://ns.adobe.com/xap/1.0/CreateDate

CreateDate in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_MODDATE = http://ns.adobe.com/xap/1.0/ModifyDate

ModifyDate in the xmp namespace.

pyhanko.pdf_utils.metadata.xmp_xml module

pyhanko.pdf_utils.metadata.xmp_xml.iter_attrs(elem: Element) Iterator[Tuple[ExpandedName, str]]
pyhanko.pdf_utils.metadata.xmp_xml.add_xmp_value(container: Element, value: XmpValue)
pyhanko.pdf_utils.metadata.xmp_xml.serialise_xmp(roots: List[XmpStructure], out: BinaryIO)
class pyhanko.pdf_utils.metadata.xmp_xml.MetadataStream(dict_data: dict | None = None, stream_data: bytes | None = None, encoded_data: bytes | None = None, handler: SecurityHandler | None = None)

Bases: StreamObject

classmethod from_xmp(xmp: List[XmpStructure]) MetadataStream
property xmp: List[XmpStructure]
update_xmp_with_meta(meta: DocumentMetadata)
pyhanko.pdf_utils.metadata.xmp_xml.update_xmp_with_meta(meta: DocumentMetadata, roots: Iterable[XmpStructure] = ())
pyhanko.pdf_utils.metadata.xmp_xml.meta_from_xmp(roots: List[XmpStructure])
exception pyhanko.pdf_utils.metadata.xmp_xml.XmpXmlProcessingError

Bases: ValueError

pyhanko.pdf_utils.metadata.xmp_xml.parse_xmp(inp: BinaryIO) List[XmpStructure]
pyhanko.pdf_utils.metadata.xmp_xml.register_namespaces()

Module contents