pyhanko.pdf_utils.embed module

Utility classes for handling embedded files in PDFs.

New in version 0.7.0.

pyhanko.pdf_utils.embed.embed_file(pdf_writer: pyhanko.pdf_utils.writer.BasePdfFileWriter, spec: pyhanko.pdf_utils.embed.FileSpec)

Embed a file in the document-wide embedded file registry of a PDF writer.

Parameters
  • pdf_writer – PDF writer to house the embedded file.

  • spec – File spec describing the embedded file.

Returns

class pyhanko.pdf_utils.embed.EmbeddedFileObject(pdf_writer: pyhanko.pdf_utils.writer.BasePdfFileWriter, dict_data=None, stream_data=None, encoded_data=None, params: Optional[pyhanko.pdf_utils.embed.EmbeddedFileParams] = None, mime_type: Optional[str] = None)

Bases: pyhanko.pdf_utils.generic.StreamObject

classmethod from_file_data(pdf_writer: pyhanko.pdf_utils.writer.BasePdfFileWriter, data: bytes, compress=True, params: Optional[pyhanko.pdf_utils.embed.EmbeddedFileParams] = None, mime_type: Optional[str] = None) pyhanko.pdf_utils.embed.EmbeddedFileObject

Construct an embedded file object from file data.

This is a very thin wrapper around the constructor, with a slightly less intimidating API.

Note

This method will not register the embedded file into the document’s embedded file namespace, see embed_file().

Parameters
  • pdf_writer – PDF writer to use.

  • data – File contents, as a bytes object.

  • compress – Whether to compress the embedded file’s contents.

  • params – Optional embedded file parameters.

  • mime_type – Optional MIME type string.

Returns

An embedded file object.

write_to_stream(stream, handler=None, container_ref=None)

Abstract method to render this object to an output stream.

Parameters
  • stream – An output stream.

  • container_ref – Local encryption key.

  • handler – Security handler

class pyhanko.pdf_utils.embed.EmbeddedFileParams(embed_size: bool = True, embed_checksum: bool = True, creation_date: Union[datetime.datetime, NoneType] = None, modification_date: Union[datetime.datetime, NoneType] = None)

Bases: object

embed_size: bool = True

If true, record the file size of the embedded file.

Note

This value is computed over the file content before PDF filters are applied. This may have performance implications in cases where the file stream contents are presented in pre-encoded form.

embed_checksum: bool = True

If true, add an MD5 checksum of the file contents.

Note

This value is computed over the file content before PDF filters are applied. This may have performance implications in cases where the file stream contents are presented in pre-encoded form.

creation_date: Optional[datetime.datetime] = None

Record the creation date of the embedded file.

modification_date: Optional[datetime.datetime] = None

Record the modification date of the embedded file.

class pyhanko.pdf_utils.embed.FileSpec(file_spec_string: str, file_name: Optional[str] = None, embedded_data: Optional[pyhanko.pdf_utils.embed.EmbeddedFileObject] = None, description: Optional[str] = None, af_relationship: Optional[pyhanko.pdf_utils.generic.NameObject] = None, f_related_files: Optional[List[pyhanko.pdf_utils.embed.RelatedFileSpec]] = None, uf_related_files: Optional[List[pyhanko.pdf_utils.embed.RelatedFileSpec]] = None)

Bases: object

Dataclass modelling an embedded file description in a PDF.

file_spec_string: str

A path-like file specification string, or URL.

Note

For backwards compatibility, this string should be encodable in PDFDocEncoding. For names that require general Unicode support, refer to file_name.

file_name: Optional[str] = None

A path-like Unicode file name.

embedded_data: Optional[pyhanko.pdf_utils.embed.EmbeddedFileObject] = None

Reference to a stream object containing the file’s data, as embedded in the PDF file.

description: Optional[str] = None

Textual description of the file.

af_relationship: Optional[pyhanko.pdf_utils.generic.NameObject] = None

Associated file relationship specifier.

Related files with PDFDocEncoded names.

Related files with Unicode-encoded names.

as_pdf_object() pyhanko.pdf_utils.generic.DictionaryObject

Represent the file spec as a PDF dictionary.

class pyhanko.pdf_utils.embed.RelatedFileSpec(name: str, embedded_data: pyhanko.pdf_utils.embed.EmbeddedFileObject)

Bases: object

Dataclass modelling a RelatedFile construct in PDF.

name: str

Name of the related file.

Note

The encoding requirements of this field depend on whether the related file is included via the /F or /UF key.

embedded_data: pyhanko.pdf_utils.embed.EmbeddedFileObject

Reference to a stream object containing the file’s data, as embedded in the PDF file.

pyhanko.pdf_utils.embed.wrap_encrypted_payload(plaintext_payload: bytes, *, password: Optional[str] = None, certs: Optional[List[asn1crypto.x509.Certificate]] = None, security_handler: Optional[pyhanko.pdf_utils.crypt.api.SecurityHandler] = None, file_spec_string: str = 'attachment.pdf', params: Optional[pyhanko.pdf_utils.embed.EmbeddedFileParams] = None, file_name: Optional[str] = None, description='Wrapped document', include_explanation_page=True) pyhanko.pdf_utils.writer.PdfFileWriter

Include a PDF document as an encrypted attachment in a wrapper document.

This function sets certain flags in the wrapper document’s collection dictionary to instruct compliant PDF viewers to display the attachment instead of the wrapping document. Viewers that do not fully support PDF collections will display a landing page instead, explaining how to open the attachment manually.

Using this method mitigates some weaknesses in the PDF standard’s encryption provisions, and makes it harder to manipulate the encrypted attachment without knowing the encryption key.

Danger

Until PDF supports authenticated encryption mechanisms, this is a mitigation strategy, not a foolproof defence mechanism.

Warning

While users of viewers that do not support PDF collections can still open the attached file manually, the viewer still has to support PDF files where only the attachments are encrypted.

Note

This is not quite the same as the “unencrypted wrapper document” pattern discussed in the PDF 2.0 specification. The latter is intended to support nonstandard security handlers. This function uses a standard security handler on the wrapping document to encrypt the attachment as a binary blob. Moreover, the functionality in this function is available in PDF 1.7 viewers as well.

Parameters
  • plaintext_payload – The plaintext payload (a binary representation of a PDF document).

  • security_handler – The security handler to use on the wrapper document. If None, a security handler will be constructed based on the password or certs parameter.

  • password – Password to encrypt the attachment with. Will be ignored if security_handler is provided.

  • certs – Encrypt the file using PDF public-key encryption, targeting the keys in the provided certificates. Will be ignored if security_handler is provided.

  • file_spec_string – PDFDocEncoded file spec string for the attachment.

  • params – Embedded file parameters to use.

  • file_name – Unicode file name for the attachment.

  • description – Description for the attachment

  • include_explanation_page – If False, do not generate an explanation page in the wrapper document. This setting could be useful if you want to customise the wrapper document’s behaviour yourself.

Returns

A PdfFileWriter representing the wrapper document.