pyhanko.pdf_utils.metadata package

Submodules

pyhanko.pdf_utils.metadata.info module

pyhanko.pdf_utils.metadata.info.update_info_dict(meta: DocumentMetadata, info: DictionaryObject, only_update_existing: bool = False) → bool

pyhanko.pdf_utils.metadata.info.view_from_info_dict(info_dict: DictionaryObject, strict: bool = True) → DocumentMetadata

pyhanko.pdf_utils.metadata.model module

New in version 0.14.0.

This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.

class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, author: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, subject: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, keywords: ~typing.List[str] = <factory>, creator: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, created: str | ~datetime.datetime | None = None, last_modified: str | ~datetime.datetime | None = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)

Bases: object

Simple representation of document metadata. All entries are optional.

title: StringWithLanguage | str | None = None: The document’s title.

author: StringWithLanguage | str | None = None: The document’s author.

subject: StringWithLanguage | str | None = None: The document’s subject.

keywords: List[str]: Keywords associated with the document.

creator: StringWithLanguage | str | None = None: The software that was used to author the document.

Note

This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.

created: str | datetime | None = None: The time when the document was created. To set it to the current time, specify now.

last_modified: str | datetime | None = 'now': The time when the document was last modified. Defaults to the current time upon serialisation if not specified.

xmp_extra: List[XmpStructure]: Extra XMP metadata.

xmp_unmanaged: bool = False: Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than xmp_extra will be ignored when updating XMP metadata.

Note

The last-modified date and producer entries in the info dictionary will still be updated.

Note

DocumentMetadata represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.

view_over(base: DocumentMetadata)

pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.23.3.dev1': pyHanko version identifier in textual form

pyhanko.pdf_utils.metadata.model.MetaString

A regular string, a string with a language code, or nothing at all.

alias of Optional[Union[StringWithLanguage, str]]

class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)

Bases: object

An expanded XML name.

ns: str: The URI of the namespace in which the name resides.

local_name: str: The local part of the name.

class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])

Bases: object

XMP value qualifiers wrapper. Implements __getitem__. Note that xml:lang gets special treatment.

Parameters:: quals – The qualifiers to model.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) → Qualifiers

Construct a Qualifiers object from a list of name-value pairs.

Parameters:: lst – A list of name-value pairs.
Returns:: A Qualifiers object.

classmethod lang_as_qual(lang: str | None) → Qualifiers

Construct a Qualifiers object that only wraps a language qualifier.

Parameters:: lang – A language code.
Returns:: A Qualifiers object.

iter_quals(with_lang: bool = True) → Iterable[Tuple[ExpandedName, XmpValue]]

Iterate over all qualifiers.

Parameters:: with_lang – Include the language qualifier.
Returns:

property lang: str | None: Retrieve the language qualifier, if any.

property has_non_lang_quals: bool: Check if there are any non-language qualifiers.

class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~pyhanko.pdf_utils.metadata.model.XmpStructure | ~pyhanko.pdf_utils.metadata.model.XmpArray | ~pyhanko.pdf_utils.metadata.model.XmpUri | str, qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)

Bases: object

A general XMP value, potentially with qualifiers.

value: XmpStructure | XmpArray | XmpUri | str: The value.

qualifiers: Qualifiers: Qualifiers that apply to the value.

class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])

Bases: object

A generic XMP structure value. Implements __getitem__ for field access.

Parameters:: fields – The structure’s fields.

classmethod of(*lst: Tuple[ExpandedName, XmpValue]) → XmpStructure

Construct an XmpStructure from a list of name-value pairs.

Parameters:: lst – A list of name-value pairs.
Returns:: An an XmpStructure.

class pyhanko.pdf_utils.metadata.model.XmpArrayType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

XMP array types.

ORDERED = 'Seq': Ordered array.

UNORDERED = 'Bag': Unordered array.

ALTERNATIVE = 'Alt': Alternative array.

as_rdf() → ExpandedName: Render the type as an XML name.

class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])

Bases: object

An XMP array.

array_type: XmpArrayType: The type of the array.

entries: List[XmpValue]: The entries in the array.

classmethod ordered(lst: Iterable[XmpValue]) → XmpArray

Convert a list to an ordered XMP array.

Parameters:: lst – An iterable of XMP values.
Returns:: An ordered XmpArray.

classmethod unordered(lst: Iterable[XmpValue]) → XmpArray

Convert a list to an unordered XMP array.

Parameters:: lst – An iterable of XMP values.
Returns:: An unordered XmpArray.

classmethod alternative(lst: Iterable[XmpValue]) → XmpArray

Convert a list to an alternative XMP array.

Parameters:: lst – An iterable of XMP values.
Returns:: An alternative XmpArray.

pyhanko.pdf_utils.metadata.model.NS = {'dc': 'http://purl.org/dc/elements/1.1/', 'pdf': 'http://ns.adobe.com/pdf/1.3/', 'pdfaExtension': 'http://www.aiim.org/pdfa/ns/extension/', 'pdfaProperty': 'http://www.aiim.org/pdfa/ns/property#', 'pdfaSchema': 'http://www.aiim.org/pdfa/ns/schema#', 'pdfaid': 'http://www.aiim.org/pdfa/ns/id/', 'pdfuaid': 'http://www.aiim.org/pdfua/ns/id/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'x': 'adobe:ns:meta/', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xmp': 'http://ns.adobe.com/xap/1.0/'}: Known namespaces and their customary prefixes.

pyhanko.pdf_utils.metadata.model.XML_LANG = http://www.w3.org/XML/1998/namespace/lang: lang in the xml namespace.

pyhanko.pdf_utils.metadata.model.RDF_RDF = http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF: RDF in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_SEQ = http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq: Seq in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_BAG = http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag: Bag in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ALT = http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt: Alt in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_LI = http://www.w3.org/1999/02/22-rdf-syntax-ns#li: li in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_VALUE = http://www.w3.org/1999/02/22-rdf-syntax-ns#value: value in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_RESOURCE = http://www.w3.org/1999/02/22-rdf-syntax-ns#resource: resource in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE = http://www.w3.org/1999/02/22-rdf-syntax-ns#parseType: parseType in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_ABOUT = http://www.w3.org/1999/02/22-rdf-syntax-ns#about: about in the rdf namespace.

pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION = http://www.w3.org/1999/02/22-rdf-syntax-ns#Description: Description in the rdf namespace.

pyhanko.pdf_utils.metadata.model.DC_TITLE = http://purl.org/dc/elements/1.1/title: title in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_CREATOR = http://purl.org/dc/elements/1.1/creator: creator in the dc namespace.

pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION = http://purl.org/dc/elements/1.1/description: description in the dc namespace.

pyhanko.pdf_utils.metadata.model.PDF_PRODUCER = http://ns.adobe.com/pdf/1.3/Producer: Producer in the pdf namespace.

pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS = http://ns.adobe.com/pdf/1.3/keywords: keywords in the pdf namespace.

pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta: xmpmeta in the x namespace.

pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk: xmptk in the x namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL = http://ns.adobe.com/xap/1.0/CreatorTool: CreatorTool in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE = http://ns.adobe.com/xap/1.0/CreateDate: CreateDate in the xmp namespace.

pyhanko.pdf_utils.metadata.model.XMP_MODDATE = http://ns.adobe.com/xap/1.0/ModifyDate: ModifyDate in the xmp namespace.

pyhanko.pdf_utils.metadata.xmp_xml module

pyhanko.pdf_utils.metadata.xmp_xml.iter_attrs(elem: Element) → Iterator[Tuple[ExpandedName, str]]

pyhanko.pdf_utils.metadata.xmp_xml.add_xmp_value(container: Element, value: XmpValue)

pyhanko.pdf_utils.metadata.xmp_xml.serialise_xmp(roots: List[XmpStructure], out: BinaryIO)

class pyhanko.pdf_utils.metadata.xmp_xml.MetadataStream(dict_data: dict | None = None, stream_data: bytes | None = None, encoded_data: bytes | None = None, handler: SecurityHandler | None = None)

Bases: StreamObject

classmethod from_xmp(xmp: List[XmpStructure]) → MetadataStream

property xmp: List[XmpStructure]

update_xmp_with_meta(meta: DocumentMetadata)

pyhanko.pdf_utils.metadata.xmp_xml.update_xmp_with_meta(meta: DocumentMetadata, roots: Iterable[XmpStructure] = ())

pyhanko.pdf_utils.metadata.xmp_xml.meta_from_xmp(roots: List[XmpStructure])

exception pyhanko.pdf_utils.metadata.xmp_xml.XmpXmlProcessingError: Bases: ValueError

pyhanko.pdf_utils.metadata.xmp_xml.parse_xmp(inp: BinaryIO) → List[XmpStructure]

pyhanko.pdf_utils.metadata.xmp_xml.register_namespaces()

pyhanko.pdf_utils.metadata package

Submodules

pyhanko.pdf_utils.metadata.info module

pyhanko.pdf_utils.metadata.model module

pyhanko.pdf_utils.metadata.xmp_xml module

Module contents