Pdfnaut

Latest version: v0.6.0

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.6

This is the first "stable" release of pdfnaut in the sense that it has been tested from 3.9 up to 3.12. It's not ready for production at all but, at least, it's usable.

This is because we now use Ruff for linting and formatting and Tox for running tests across multiple Python versions. Please now assume that pdfnaut 0.5 and below will **not** run under Python 3.10 or below. pdfnaut 0.6 has been properly tested (hopefully).

Features

- Added **automatic reference resolution** to dictionaries and arrays which are now `PdfDictionary` and `PdfArray` (rather than `list` and `dict`). These objects behave similarly to `dict` and `list` respectively but they include this additional behavior. This has been done mainly for convenience and parity with other PDF processors.
- The `typings/` package has been removed in favor of more complete cobjects. Currently, partial implementations of `Page`, `Info`, and `Annotation` are included.
- Added support for PDF dates, text strings, and `PDFDocEncoding`.
- A cache system was added to the PDF parser. It is not currently used but it will become useful once PDF writing is supported.
- **3.8 support has been dropped.**

Fixes

- Fixed a bug where the tokenizer skipped a character when processing octal character code escapes (`\ddd`)

0.5

Features

- Added `ContentStreamIterator` to help with iterating over content streams.
- Renamed the following:
- `PdfIndirectRef` -> `PdfReference`
- `PdfParser.version` -> `PdfParser.header_version`
- The `objects` package was moved into the `cos` subdirectory. This was done to make space for a higher-level objects package.

Internals

- `list.index()` calls when parsing indirect objects and streams caused notable slowdowns. These calls have since been replaced with faster equivalents from the lexer.
- The lexer was split up into smaller reusable functions such as `skip_while()` and `consume_while()`. `skip_next_eol()` and `peek_line()` were added to replace `current_to_eol()` and `next_eol()`. Other functions like `peek()` (previously `advance()`) were cleaned up.

0.4

Features

- PdfDocument has introduced a new `access_level` attribute allowing consumers to determine the general permissions of a document.
- PdfSerializer can now write XRef streams using `write_compressed_xref_table`.

Organization

- `PdfTokenizer`, `PdfParser`, and `PdfSerializer` have been moved to the `cos` package.
- The standard security handler and its providers are now separate and now live in the `security` package.

Renaming

The following attributes have been renamed to maintain parity with other PDF packages or to better reflect their intended use.

- `PdfTokenizer.is_content_stream` -> `PdfTokenizer.parse_operators`
- `PdfParser.resolve_reference` -> `PdfParser.get_object` (same with `PdfDocument`)
- `PdfSerializer.generate_standard_xref_table` -> `PdfSerializer.generate_xref_table`
- `PdfStream.decompress` -> `PdfStream.decode`

Fixes & Other Changes

- `typing-extensions` was previously only installed on Python 3.11 and below. This was a mistake. It is now required for all versions.
- pdfnaut now aims to become a PDF 2.0 processor rather than a PDF 1.7 processor. Both are fairly backwards-compatible so PDF 1.7 files should still work.

0.3.1

This is a very small update that addresses an issue when serializing standard XRef tables using ``generate_standard_xref_table``. Inputs that would produce tables with multiple subsections would previously fail. This should now be fixed.

0.3

0.3 includes major changes that should hopefully improve developer experience when reading PDF files.

- pdfnaut now includes typings for many common PDF dictionaries including the trailer, catalog, outlines, and others. (this change introduces a new dependency, `typing-extensions`, for users of Python 3.11 or earlier)
- PdfDocument has been added as a foundation of the high-level API. It allows quick access to common PDF objects (such as pages)
- pdfnaut now adds a new "strict" mode that warns or fails when parsing non-spec-compliant documents. This also means pdfnaut is also more lenient towards these files (given that strict=False).
- Document lexing should be slightly faster for references since we now first check for a digit before invoking regex.

0.2

This update adds support for encoding and encrypting objects.

Additions:

- All filters (except Crypt) now support encoding.
- Added a decoder for the RunLengthDecode filter
- The security handler was reworked to allow for encryption of objects.

Removals:

- The TIFF predictor function used previously was implemented improperly. It has now been removed.

Fixes:

- The PNG predictor function previously worked on entire samples rather than on bytes, which led to only one of the color components being preserved. This has now been fixed.

Page 1 of 2

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.