pyhanko.pdf_utils.metadata package
Submodules
pyhanko.pdf_utils.metadata.info module
- pyhanko.pdf_utils.metadata.info.update_info_dict(meta: DocumentMetadata, info: DictionaryObject, only_update_existing: bool = False) bool
- pyhanko.pdf_utils.metadata.info.view_from_info_dict(info_dict: DictionaryObject, strict: bool = True) DocumentMetadata
pyhanko.pdf_utils.metadata.model module
New in version 0.14.0.
This module contains the XMP data model classes and namespace registry, in addition to a simplified document metadata model used for automated metadata management.
- class pyhanko.pdf_utils.metadata.model.DocumentMetadata(title: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, author: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, subject: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, keywords: ~typing.List[str] = <factory>, creator: ~pyhanko.pdf_utils.misc.StringWithLanguage | str | None = None, created: str | ~datetime.datetime | None = None, last_modified: str | ~datetime.datetime | None = 'now', xmp_extra: ~typing.List[~pyhanko.pdf_utils.metadata.model.XmpStructure] = <factory>, xmp_unmanaged: bool = False)
Bases:
object
Simple representation of document metadata. All entries are optional.
- title: StringWithLanguage | str | None = None
The document’s title.
- author: StringWithLanguage | str | None = None
The document’s author.
- subject: StringWithLanguage | str | None = None
The document’s subject.
- keywords: List[str]
Keywords associated with the document.
- creator: StringWithLanguage | str | None = None
The software that was used to author the document.
Note
This is distinct from the producer, which is typically used to indicate which PDF processor(s) interacted with the file.
- created: str | datetime | None = None
The time when the document was created. To set it to the current time, specify
now
.
- last_modified: str | datetime | None = 'now'
The time when the document was last modified. Defaults to the current time upon serialisation if not specified.
- xmp_extra: List[XmpStructure]
Extra XMP metadata.
- xmp_unmanaged: bool = False
Flag metadata as XMP-only. This means that the info dictionary will be cleared out as much as possible, and that all attributes other than
xmp_extra
will be ignored when updating XMP metadata.Note
The last-modified date and producer entries in the info dictionary will still be updated.
Note
DocumentMetadata
represents a data model that is much more simple than what XMP is actually capable of. You can use this flag if you need more fine-grained control.
- view_over(base: DocumentMetadata)
- pyhanko.pdf_utils.metadata.model.VENDOR = 'pyHanko 0.21.1-dev1'
pyHanko version identifier in textual form
- pyhanko.pdf_utils.metadata.model.MetaString
A regular string, a string with a language code, or nothing at all.
alias of
Optional
[Union
[StringWithLanguage
,str
]]
- class pyhanko.pdf_utils.metadata.model.ExpandedName(ns: str, local_name: str)
Bases:
object
An expanded XML name.
- ns: str
The URI of the namespace in which the name resides.
- local_name: str
The local part of the name.
- class pyhanko.pdf_utils.metadata.model.Qualifiers(quals: Dict[ExpandedName, XmpValue])
Bases:
object
XMP value qualifiers wrapper. Implements
__getitem__
. Note thatxml:lang
gets special treatment.- Parameters:
quals – The qualifiers to model.
- classmethod of(*lst: Tuple[ExpandedName, XmpValue]) Qualifiers
Construct a
Qualifiers
object from a list of name-value pairs.- Parameters:
lst – A list of name-value pairs.
- Returns:
A
Qualifiers
object.
- classmethod lang_as_qual(lang: str | None) Qualifiers
Construct a
Qualifiers
object that only wraps a language qualifier.- Parameters:
lang – A language code.
- Returns:
A
Qualifiers
object.
- iter_quals(with_lang: bool = True) Iterable[Tuple[ExpandedName, XmpValue]]
Iterate over all qualifiers.
- Parameters:
with_lang – Include the language qualifier.
- Returns:
- property lang: str | None
Retrieve the language qualifier, if any.
- property has_non_lang_quals: bool
Check if there are any non-language qualifiers.
- class pyhanko.pdf_utils.metadata.model.XmpValue(value: ~pyhanko.pdf_utils.metadata.model.XmpStructure | ~pyhanko.pdf_utils.metadata.model.XmpArray | ~pyhanko.pdf_utils.metadata.model.XmpUri | str, qualifiers: ~pyhanko.pdf_utils.metadata.model.Qualifiers = <factory>)
Bases:
object
A general XMP value, potentially with qualifiers.
- value: XmpStructure | XmpArray | XmpUri | str
The value.
- qualifiers: Qualifiers
Qualifiers that apply to the value.
- class pyhanko.pdf_utils.metadata.model.XmpStructure(fields: Dict[ExpandedName, XmpValue])
Bases:
object
A generic XMP structure value. Implements
__getitem__
for field access.- Parameters:
fields – The structure’s fields.
- classmethod of(*lst: Tuple[ExpandedName, XmpValue]) XmpStructure
Construct an
XmpStructure
from a list of name-value pairs.- Parameters:
lst – A list of name-value pairs.
- Returns:
An an
XmpStructure
.
- class pyhanko.pdf_utils.metadata.model.XmpArrayType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
XMP array types.
- ORDERED = 'Seq'
Ordered array.
- UNORDERED = 'Bag'
Unordered array.
- ALTERNATIVE = 'Alt'
Alternative array.
- as_rdf() ExpandedName
Render the type as an XML name.
- class pyhanko.pdf_utils.metadata.model.XmpArray(array_type: XmpArrayType, entries: List[XmpValue])
Bases:
object
An XMP array.
- array_type: XmpArrayType
The type of the array.
- classmethod ordered(lst: Iterable[XmpValue]) XmpArray
Convert a list to an ordered XMP array.
- Parameters:
lst – An iterable of XMP values.
- Returns:
An ordered
XmpArray
.
- pyhanko.pdf_utils.metadata.model.NS = {'dc': 'http://purl.org/dc/elements/1.1/', 'pdf': 'http://ns.adobe.com/pdf/1.3/', 'pdfaExtension': 'http://www.aiim.org/pdfa/ns/extension/', 'pdfaProperty': 'http://www.aiim.org/pdfa/ns/property#', 'pdfaSchema': 'http://www.aiim.org/pdfa/ns/schema#', 'pdfaid': 'http://www.aiim.org/pdfa/ns/id/', 'pdfuaid': 'http://www.aiim.org/pdfua/ns/id/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'x': 'adobe:ns:meta/', 'xml': 'http://www.w3.org/XML/1998/namespace', 'xmp': 'http://ns.adobe.com/xap/1.0/'}
Known namespaces and their customary prefixes.
- pyhanko.pdf_utils.metadata.model.XML_LANG = http://www.w3.org/XML/1998/namespace/lang
lang
in thexml
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_RDF = http://www.w3.org/1999/02/22-rdf-syntax-ns#RDF
RDF
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_SEQ = http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq
Seq
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_BAG = http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag
Bag
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_ALT = http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt
Alt
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_LI = http://www.w3.org/1999/02/22-rdf-syntax-ns#li
li
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_VALUE = http://www.w3.org/1999/02/22-rdf-syntax-ns#value
value
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_RESOURCE = http://www.w3.org/1999/02/22-rdf-syntax-ns#resource
resource
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_PARSE_TYPE = http://www.w3.org/1999/02/22-rdf-syntax-ns#parseType
parseType
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_ABOUT = http://www.w3.org/1999/02/22-rdf-syntax-ns#about
about
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.RDF_DESCRIPTION = http://www.w3.org/1999/02/22-rdf-syntax-ns#Description
Description
in therdf
namespace.
- pyhanko.pdf_utils.metadata.model.DC_TITLE = http://purl.org/dc/elements/1.1/title
title
in thedc
namespace.
- pyhanko.pdf_utils.metadata.model.DC_CREATOR = http://purl.org/dc/elements/1.1/creator
creator
in thedc
namespace.
- pyhanko.pdf_utils.metadata.model.DC_DESCRIPTION = http://purl.org/dc/elements/1.1/description
description
in thedc
namespace.
- pyhanko.pdf_utils.metadata.model.PDF_PRODUCER = http://ns.adobe.com/pdf/1.3/Producer
Producer
in thepdf
namespace.
- pyhanko.pdf_utils.metadata.model.PDF_KEYWORDS = http://ns.adobe.com/pdf/1.3/keywords
keywords
in thepdf
namespace.
- pyhanko.pdf_utils.metadata.model.X_XMPMETA = adobe:ns:meta/xmpmeta
xmpmeta
in thex
namespace.
- pyhanko.pdf_utils.metadata.model.X_XMPTK = adobe:ns:meta/xmptk
xmptk
in thex
namespace.
- pyhanko.pdf_utils.metadata.model.XMP_CREATORTOOL = http://ns.adobe.com/xap/1.0/CreatorTool
CreatorTool
in thexmp
namespace.
- pyhanko.pdf_utils.metadata.model.XMP_CREATEDATE = http://ns.adobe.com/xap/1.0/CreateDate
CreateDate
in thexmp
namespace.
- pyhanko.pdf_utils.metadata.model.XMP_MODDATE = http://ns.adobe.com/xap/1.0/ModifyDate
ModifyDate
in thexmp
namespace.
pyhanko.pdf_utils.metadata.xmp_xml module
- pyhanko.pdf_utils.metadata.xmp_xml.iter_attrs(elem: Element) Iterator[Tuple[ExpandedName, str]]
- pyhanko.pdf_utils.metadata.xmp_xml.serialise_xmp(roots: List[XmpStructure], out: BinaryIO)
- class pyhanko.pdf_utils.metadata.xmp_xml.MetadataStream(dict_data: dict | None = None, stream_data: bytes | None = None, encoded_data: bytes | None = None, handler: SecurityHandler | None = None)
Bases:
StreamObject
- classmethod from_xmp(xmp: List[XmpStructure]) MetadataStream
- property xmp: List[XmpStructure]
- update_xmp_with_meta(meta: DocumentMetadata)
- pyhanko.pdf_utils.metadata.xmp_xml.update_xmp_with_meta(meta: DocumentMetadata, roots: Iterable[XmpStructure] = ())
- pyhanko.pdf_utils.metadata.xmp_xml.meta_from_xmp(roots: List[XmpStructure])
- exception pyhanko.pdf_utils.metadata.xmp_xml.XmpXmlProcessingError
Bases:
ValueError
- pyhanko.pdf_utils.metadata.xmp_xml.parse_xmp(inp: BinaryIO) List[XmpStructure]
- pyhanko.pdf_utils.metadata.xmp_xml.register_namespaces()