
Latest version: v0.3.2

Safety actively analyzes 682532 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4


* Improved standalone decompression speed ~5% by storing a size hint for the count of numbers in the whole file.
* Due to the above, was able to reduce default chunk size at no performance cost, improving compression ratio.
* Improved compression speed ~15% with optimized writer logic.
* Substantially increased compression and decompression speed in special cases when steps can be skipped.


* Breaking changes
* format: replaced GCD mode with int mult mode. This simplifies the format (is very similar to float mult mode) and is more robust in the ways we care about. However, GCD-encoded data from v0.0.0 will no longer be decompressible. This could have been made as a backward-compatible change, but since v0.0.0 has reasonably few downloads and GCD data is rare, I decided it was better to break it rather than keep dead old code around forever. Int mult gets 11% better compression ratio on the total_cents bench dataset than GCD did.
* API: Removed GCD-related metadata such as `Bin::gcd` and replaced configurations with int mult equivalents.
* API: Renamed `Progress.finished_page` to `Progress.finished` since it sometimes refers to different units.
* Improved decompression speed with SIMD offset reads.
* Added `standalone::simple_decompress_into`.
* Fixed a rare bug in compression that caused it to became lossy on nearly-linear sequences of floats with floating point errors.


* Improved decompression performance ~70% on aarch64, ~10% on x86_64.
* Supported consuming any `BetterBufRead` implementation during decompression, rather than only `&[u8]`
* Changed the API for `wrapped::PageDecompressor` and `standalone::ChunkDecompressor` to own `src`, since these parts of the file need to be read in order and contiguously.
* Updated docs, including real-world benchmarks on air quality, taxi, and r/place datasets.


With lower-level unit testing, found and fixed 3 serious bugs:

* encoding more than one page per chunk failed; it tried to encode the whole chunk every time
* decoding one batch at a time failed because the code path asserted the reader would be byte aligned
* decoding with most limits through the CLI failed because it create a bad count of numbers for pco


* Revamped the API into separate structs for File, Chunk, and Page compressors/decompressors.
* Fixed a known bug that caused panics for 32-bit architectures.
* Made decompression almost-zero-copy, increasing performance slightly.
* Made standalone actually just a minimal wrapped format with no access to private functionality.


Changed the format to contain tiny batches (256 numbers each) with contiguous 4-way interleaved tANS codes and contiguous offsets. This increased the buffer space needed, but allowed decent CPU utilization during tANS decoding and excellent SIMD utilization during offset decoding, approximately a 30% decompression speedup overall.

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.