pyhanko.pdf_utils.filters module

Implementation of stream filters for PDF.

Taken from PyPDF2 with modifications. See here for the original license of the PyPDF2 project.

Note that not all decoders specified in the standard are supported. In particular /Crypt and /LZWDecode are missing.

class pyhanko.pdf_utils.filters.Decoder

Bases: object

General filter/decoder interface.

decode(data: bytes, decode_params: dict)bytes

Decode a stream.

Parameters
  • data – Data to decode.

  • decode_params – Decoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Decoded data.

encode(data: bytes, decode_params: dict)bytes

Encode a stream.

Parameters
  • data – Data to encode.

  • decode_params – Encoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Encoded data.

class pyhanko.pdf_utils.filters.ASCII85Decode

Bases: pyhanko.pdf_utils.filters.Decoder

Implementation of the base 85 encoding scheme specified in ISO 32000-1.

encode(data: bytes, decode_params=None)bytes

Encode a stream.

Parameters
  • data – Data to encode.

  • decode_params – Encoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Encoded data.

decode(data, decode_params=None)

Decode a stream.

Parameters
  • data – Data to decode.

  • decode_params – Decoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Decoded data.

class pyhanko.pdf_utils.filters.ASCIIHexDecode

Bases: pyhanko.pdf_utils.filters.Decoder

Wrapper around binascii.hexlify() that implements the Decoder interface.

encode(data: bytes, decode_params=None)bytes

Encode a stream.

Parameters
  • data – Data to encode.

  • decode_params – Encoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Encoded data.

decode(data, decode_params=None)

Decode a stream.

Parameters
  • data – Data to decode.

  • decode_params – Decoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Decoded data.

class pyhanko.pdf_utils.filters.FlateDecode

Bases: pyhanko.pdf_utils.filters.Decoder

Implementation of the /FlateDecode filter.

Warning

Currently not all predictor values are supported. This may cause problems when extracting image data from PDF files.

decode(data: bytes, decode_params)

Decode a stream.

Parameters
  • data – Data to decode.

  • decode_params – Decoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Decoded data.

encode(data, decode_params=None)

Encode a stream.

Parameters
  • data – Data to encode.

  • decode_params – Encoder parameters, sourced from the /DecoderParams entry associated with this filter.

Returns

Encoded data.

pyhanko.pdf_utils.filters.get_generic_decoder(name: str)pyhanko.pdf_utils.filters.Decoder

Instantiate a specific stream filter decoder type by (PDF) name.

The following names are recognised:

  • /FlateDecode or /Fl for the decoder implementing Flate

    compression.

  • /ASCIIHexDecode or /AHx for the decoder that converts bytes to their hexadecimal representations.

  • /ASCII85Decode or /A85 for the decoder that converts byte strings to a base-85 textual representation.

Warning

/Crypt is a special case because it requires access to the document’s security handler.

Warning

LZW compression is currently unsupported, as are most compression methods that are used specifically for image data.

Parameters

name – Name of the decoder to instantiate.