pyhanko.pdf_utils.embed module
Utility classes for handling embedded files in PDFs.
New in version 0.7.0.
- pyhanko.pdf_utils.embed.embed_file(pdf_writer: BasePdfFileWriter, spec: FileSpec)
Embed a file in the document-wide embedded file registry of a PDF writer.
- Parameters
pdf_writer – PDF writer to house the embedded file.
spec – File spec describing the embedded file.
- Returns
- class pyhanko.pdf_utils.embed.EmbeddedFileObject(pdf_writer: BasePdfFileWriter, dict_data=None, stream_data=None, encoded_data=None, params: Optional[EmbeddedFileParams] = None, mime_type: Optional[str] = None)
Bases:
StreamObject
- classmethod from_file_data(pdf_writer: BasePdfFileWriter, data: bytes, compress=True, params: Optional[EmbeddedFileParams] = None, mime_type: Optional[str] = None) EmbeddedFileObject
Construct an embedded file object from file data.
This is a very thin wrapper around the constructor, with a slightly less intimidating API.
Note
This method will not register the embedded file into the document’s embedded file namespace, see
embed_file()
.- Parameters
pdf_writer – PDF writer to use.
data – File contents, as a
bytes
object.compress – Whether to compress the embedded file’s contents.
params – Optional embedded file parameters.
mime_type – Optional MIME type string.
- Returns
An embedded file object.
- write_to_stream(stream, handler=None, container_ref=None)
Abstract method to render this object to an output stream.
- Parameters
stream – An output stream.
container_ref – Local encryption key.
handler – Security handler
- class pyhanko.pdf_utils.embed.EmbeddedFileParams(embed_size: bool = True, embed_checksum: bool = True, creation_date: Union[datetime.datetime, NoneType] = None, modification_date: Union[datetime.datetime, NoneType] = None)
Bases:
object
- embed_size: bool = True
If true, record the file size of the embedded file.
Note
This value is computed over the file content before PDF filters are applied. This may have performance implications in cases where the file stream contents are presented in pre-encoded form.
- embed_checksum: bool = True
If true, add an MD5 checksum of the file contents.
Note
This value is computed over the file content before PDF filters are applied. This may have performance implications in cases where the file stream contents are presented in pre-encoded form.
- creation_date: Optional[datetime] = None
Record the creation date of the embedded file.
- modification_date: Optional[datetime] = None
Record the modification date of the embedded file.
- class pyhanko.pdf_utils.embed.FileSpec(file_spec_string: str, file_name: Optional[str] = None, embedded_data: Optional[EmbeddedFileObject] = None, description: Optional[str] = None, af_relationship: Optional[NameObject] = None, f_related_files: Optional[List[RelatedFileSpec]] = None, uf_related_files: Optional[List[RelatedFileSpec]] = None)
Bases:
object
Dataclass modelling an embedded file description in a PDF.
- file_spec_string: str
A path-like file specification string, or URL.
Note
For backwards compatibility, this string should be encodable in PDFDocEncoding. For names that require general Unicode support, refer to
file_name
.
- file_name: Optional[str] = None
A path-like Unicode file name.
- embedded_data: Optional[EmbeddedFileObject] = None
Reference to a stream object containing the file’s data, as embedded in the PDF file.
- description: Optional[str] = None
Textual description of the file.
- af_relationship: Optional[NameObject] = None
Associated file relationship specifier.
Related files with PDFDocEncoded names.
Related files with Unicode-encoded names.
- as_pdf_object() DictionaryObject
Represent the file spec as a PDF dictionary.
- class pyhanko.pdf_utils.embed.RelatedFileSpec(name: str, embedded_data: EmbeddedFileObject)
Bases:
object
Dataclass modelling a RelatedFile construct in PDF.
- name: str
Name of the related file.
Note
The encoding requirements of this field depend on whether the related file is included via the
/F
or/UF
key.
- embedded_data: EmbeddedFileObject
Reference to a stream object containing the file’s data, as embedded in the PDF file.
- pyhanko.pdf_utils.embed.wrap_encrypted_payload(plaintext_payload: bytes, *, password: Optional[str] = None, certs: Optional[List[Certificate]] = None, security_handler: Optional[SecurityHandler] = None, file_spec_string: str = 'attachment.pdf', params: Optional[EmbeddedFileParams] = None, file_name: Optional[str] = None, description='Wrapped document', include_explanation_page=True) PdfFileWriter
Include a PDF document as an encrypted attachment in a wrapper document.
This function sets certain flags in the wrapper document’s collection dictionary to instruct compliant PDF viewers to display the attachment instead of the wrapping document. Viewers that do not fully support PDF collections will display a landing page instead, explaining how to open the attachment manually.
Using this method mitigates some weaknesses in the PDF standard’s encryption provisions, and makes it harder to manipulate the encrypted attachment without knowing the encryption key.
Danger
Until PDF supports authenticated encryption mechanisms, this is a mitigation strategy, not a foolproof defence mechanism.
Warning
While users of viewers that do not support PDF collections can still open the attached file manually, the viewer still has to support PDF files where only the attachments are encrypted.
Note
This is not quite the same as the “unencrypted wrapper document” pattern discussed in the PDF 2.0 specification. The latter is intended to support nonstandard security handlers. This function uses a standard security handler on the wrapping document to encrypt the attachment as a binary blob. Moreover, the functionality in this function is available in PDF 1.7 viewers as well.
- Parameters
plaintext_payload – The plaintext payload (a binary representation of a PDF document).
security_handler – The security handler to use on the wrapper document. If
None
, a security handler will be constructed based on thepassword
orcerts
parameter.password – Password to encrypt the attachment with. Will be ignored if
security_handler
is provided.certs – Encrypt the file using PDF public-key encryption, targeting the keys in the provided certificates. Will be ignored if
security_handler
is provided.file_spec_string – PDFDocEncoded file spec string for the attachment.
params – Embedded file parameters to use.
file_name – Unicode file name for the attachment.
description – Description for the attachment
include_explanation_page – If
False
, do not generate an explanation page in the wrapper document. This setting could be useful if you want to customise the wrapper document’s behaviour yourself.
- Returns
A
PdfFileWriter
representing the wrapper document.