pyhanko.pdf_utils.misc module

Utility functions for PDF library. Taken from PyPDF2 with modifications and additions, see here for the original license of the PyPDF2 project.

Generally, all of these constitute internal API, except for the exception classes.

exception pyhanko.pdf_utils.misc.PdfError(msg: str, *args)

Bases: Exception

exception pyhanko.pdf_utils.misc.PdfReadError(msg: str, *args)

Bases: PdfError

exception pyhanko.pdf_utils.misc.PdfStrictReadError(msg: str, *args)

Bases: PdfReadError

exception pyhanko.pdf_utils.misc.PdfWriteError(msg: str, *args)

Bases: PdfError

exception pyhanko.pdf_utils.misc.PdfStreamError(msg: str, *args)

Bases: PdfReadError

exception pyhanko.pdf_utils.misc.IndirectObjectExpected(msg: Optional[str] = None)

Bases: PdfReadError

pyhanko.pdf_utils.misc.get_and_apply(dictionary: dict, key, function: Callable, *, default=None)
class pyhanko.pdf_utils.misc.OrderedEnum(value)

Bases: Enum

Ordered enum (from the Python documentation)

class pyhanko.pdf_utils.misc.StringWithLanguage(value: str, lang_code: Optional[str] = None, country_code: Optional[str] = None)

Bases: object

A string with a language attached to it.

value: str
lang_code: Optional[str] = None
country_code: Optional[str] = None
pyhanko.pdf_utils.misc.is_regular_character(byte_value: int)
pyhanko.pdf_utils.misc.read_non_whitespace(stream, seek_back=False, allow_eof=False)

Finds and reads the next non-whitespace character (ignores whitespace).

pyhanko.pdf_utils.misc.read_until_whitespace(stream, maxchars: Optional[int] = None) bytes

Reads non-whitespace characters and returns them. Stops upon encountering whitespace, or, if maxchars is not None, when maxchars is reached.

Parameters:
  • stream – stream to read

  • maxchars – maximal number of bytes to read before returning

pyhanko.pdf_utils.misc.read_until_delimiter(stream) bytes

Read until a token delimiter (i.e. a delimiter character or a PDF whitespace character) is encountered, and rewind the stream to the previous character.

Parameters:

stream – A stream.

Returns:

The bytes read.

pyhanko.pdf_utils.misc.read_until_regex(stream, regex, ignore_eof: bool = False)

Reads until the regular expression pattern matched (ignore the match) Raise PdfStreamError on premature end-of-file.

Parameters:
  • stream – stream to search

  • regex – regex to match

  • ignore_eof – if true, ignore end-of-line and return immediately

Raises:

PdfStreamError – on premature EOF

pyhanko.pdf_utils.misc.skip_over_whitespace(stream, stop_after_eol=False) bool

Similar to read_non_whitespace(), but returns a bool if more than one whitespace character was read.

Will return the cursor to before the first non-whitespace character encountered, or after the first end-of-line sequence if one is encountered.

pyhanko.pdf_utils.misc.skip_over_comment(stream) bool

Skip over a comment and position the cursor at the first byte after the EOL sequence following the comment. If there is no comment under the cursor, do nothing.

Parameters:

stream – stream to read

Returns:

True if a comment was read.

pyhanko.pdf_utils.misc.instance_test(cls)
pyhanko.pdf_utils.misc.peek(itr)
pyhanko.pdf_utils.misc.assert_writable_and_random_access(output)

Raise an error if the buffer in question is not writable, and return a boolean to indicate whether it supports random-access reading.

Parameters:

output

Returns:

pyhanko.pdf_utils.misc.prepare_rw_output_stream(output)

Prepare an output stream that supports both reading and writing. Intended to be used for writing & updating signed files: when producing a signature, we render the PDF to a byte buffer with placeholder values for the signature data, or straight to the provided output stream if possible.

More precisely: this function will return the original output stream if it is writable, readable and seekable. If the output parameter is None, not readable or not seekable, this function will return a BytesIO instance instead. If the output parameter is not None and not writable, IOError will be raised.

Parameters:

output – A writable file-like object, or None.

Returns:

A file-like object that supports reading, writing and seeking.

pyhanko.pdf_utils.misc.finalise_output(orig_output, returned_output)

Several internal APIs transparently replaces non-readable/seekable buffers with BytesIO for signing operations, but we don’t want to expose that to the public API user. This internal API function handles the unwrapping.

pyhanko.pdf_utils.misc.DEFAULT_CHUNK_SIZE = 4096

Default chunk size for stream I/O.

pyhanko.pdf_utils.misc.chunked_write(temp_buffer: bytearray, stream, output, max_read=None)
pyhanko.pdf_utils.misc.chunked_digest(temp_buffer: bytearray, stream, md, max_read=None)
pyhanko.pdf_utils.misc.chunk_stream(temp_buffer: bytearray, stream, max_read=None)
class pyhanko.pdf_utils.misc.ConsList(head: object, tail: 'ConsList' = None)

Bases: object

head: object
tail: ConsList = None
static empty() ConsList
static sing(value) ConsList
cons(head)
class pyhanko.pdf_utils.misc.Singleton(name, bases, dct)

Bases: type

pyhanko.pdf_utils.misc.rd(x)
pyhanko.pdf_utils.misc.isoparse(dt_str: str) datetime
async pyhanko.pdf_utils.misc.lift_iterable_async(i: Iterable[X]) CancelableAsyncIterator[X]