Pdfnaut

Latest version: v0.8.0

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.8

pdfnaut 0.8 brings a bunch of new features for use with PdfDocument as well as quality-of-life things that should improve performance and developer experience.

Features

- pdfnaut can now read and write XMP metadata. :tada:
- pdfnaut can now create documents (PdfDocument.new) :tada:
- pdfnaut now supports Python 3.13.
- PdfDate has been removed in favor of separate date utilities.
- Added partial date encoding support.
- Added caching for objects within object streams. This should hopefully reduce the time needed to access them.
- Added remaining page boundaries to Page object (BleedBox, ArtBox, and TrimBox)
- Added `UserAccessPermissions` and `PdfDocument.access_permissions`.
- Added `color` field to Annotation.
- The fields module has been moved to a common folder alongside utilities previously part of the parser module.
- Added PdfStream.modify for in-place modification of streams.
- PdfDocument.save now accepts saving to in-memory objects or to pathlib.Path instances.
- Renamed `pdf.info` to `pdf.doc_info` and `pdf.metadata` to `pdf.xmp_info`
- PdfDocument.save now always writes an ID entry in the trailer.

Fixes

- PdfStream.create now acknowledges Crypt params
- Contiguous octal escape sequences in literal strings are now processed correctly. Octal escape codes are also encoded as one byte rather than one or two bytes in all scenarios.
- The freelist created when writing a document is now populated with correct values rather than placeholders.
- AnnotationFlags now uses correct bit positions.
- pdfnaut now gracefully handles circular reference scenarios that could lead to infinite loops.

**Full Changelog**: https://github.com/aescarias/pdfnaut/compare/v0.7...v0.8

0.7

After a while in the works, pdfnaut 0.7 has been finally released.

Features

- pdfnaut can now save documents. :tada:
- pdfnaut can now read hybrid-reference files correctly (i.e. files with an `XRefStm` entry).
- Added [pyca/cryptography](https://cryptography.io/en/latest/) as a supported crypt provider.
- Added `PdfStream.create` for easier stream encoding.
- Added encoding support for `RunLengthFilter`.
- A new abstraction for adding and removing objects has been added known as the object map. It is available as `PdfParser.objects`.

Other Changes

- `PdfDocument` now inherits from `PdfParser` to avoid redundancy.
- Account for documents not starting with a PDF header.
- High-level PDF objects now use a field system. The field system helps developers implement their own PDF objects while reducing boilerplate.
- The XRef system has been reworked to align more with the spec.
- `PdfParser.updates` no longer uses tuples but rather `PdfXRefSection`s.
- The serializer no longer uses 4-item tuples but rather 2-item tuples for writing.
- Most mentions of "table" have been replaced with "section".

0.6

This is the first "stable" release of pdfnaut in the sense that it has been tested from 3.9 up to 3.12. It's not ready for production at all but, at least, it's usable.

This is because we now use Ruff for linting and formatting and Tox for running tests across multiple Python versions. Please now assume that pdfnaut 0.5 and below will **not** run under Python 3.10 or below. pdfnaut 0.6 has been properly tested (hopefully).

Features

- Added **automatic reference resolution** to dictionaries and arrays which are now `PdfDictionary` and `PdfArray` (rather than `list` and `dict`). These objects behave similarly to `dict` and `list` respectively but they include this additional behavior. This has been done mainly for convenience and parity with other PDF processors.
- The `typings/` package has been removed in favor of more complete objects. Currently, partial implementations of `Page`, `Info`, and `Annotation` are included.
- Added support for PDF dates, text strings, and `PDFDocEncoding`.
- A cache system was added to the PDF parser. It is not currently used but it will become useful once PDF writing is supported.
- **3.8 support has been dropped.**

Fixes

- Fixed a bug where the tokenizer skipped a character when processing octal character code escapes (`\ddd`)

0.5

Features

- Added `ContentStreamIterator` to help with iterating over content streams.
- Renamed the following:
- `PdfIndirectRef` -> `PdfReference`
- `PdfParser.version` -> `PdfParser.header_version`
- The `objects` package was moved into the `cos` subdirectory. This was done to make space for a higher-level objects package.

Internals

- `list.index()` calls when parsing indirect objects and streams caused notable slowdowns. These calls have since been replaced with faster equivalents from the lexer.
- The lexer was split up into smaller reusable functions such as `skip_while()` and `consume_while()`. `skip_next_eol()` and `peek_line()` were added to replace `current_to_eol()` and `next_eol()`. Other functions like `peek()` (previously `advance()`) were cleaned up.

0.4

Features

- PdfDocument has introduced a new `access_level` attribute allowing consumers to determine the general permissions of a document.
- PdfSerializer can now write XRef streams using `write_compressed_xref_table`.

Organization

- `PdfTokenizer`, `PdfParser`, and `PdfSerializer` have been moved to the `cos` package.
- The standard security handler and its providers are now separate and now live in the `security` package.

Renaming

The following attributes have been renamed to maintain parity with other PDF packages or to better reflect their intended use.

- `PdfTokenizer.is_content_stream` -> `PdfTokenizer.parse_operators`
- `PdfParser.resolve_reference` -> `PdfParser.get_object` (same with `PdfDocument`)
- `PdfSerializer.generate_standard_xref_table` -> `PdfSerializer.generate_xref_table`
- `PdfStream.decompress` -> `PdfStream.decode`

Fixes & Other Changes

- `typing-extensions` was previously only installed on Python 3.11 and below. This was a mistake. It is now required for all versions.
- pdfnaut now aims to become a PDF 2.0 processor rather than a PDF 1.7 processor. Both are fairly backwards-compatible so PDF 1.7 files should still work.

0.3.1

This is a very small update that addresses an issue when serializing standard XRef tables using ``generate_standard_xref_table``. Inputs that would produce tables with multiple subsections would previously fail. This should now be fixed.

Page 1 of 2

Releases

Has known vulnerabilities

Pdfnaut

Page 1 of 2

0.8

0.7

0.6

0.5

0.4

0.3.1

Page 1 of 2

Links

Releases