Biglist

Latest version: v0.9.7

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

0.7.9

Removed

- Removed previously deprecated methods related to `multiplexer` on `Biglist`.
- Removed previously deprecated methods related to `concurrent_iter` on `FileSeq`.
- Removed `orjson` related storage formats.

Added

- Optional dependency `lz4`
- Storage formats `pickle-lz4`
- Pickling behavior control for `FileReader` via `__getstate__` and `__setstate__`.

Changed or enhanced

- Change dependency `zstandard` to optional
- `Multiplexer` uses default storage-format 'parquet'.
- classmethods ``get_gcsfs`` and ``load_data_file`` are moved from ``ParquetBiglist`` to ``ParquetFileReader``;
The second method is renamed to ``load_file``. Some related simplifications to ``FileReader``, ``ParquetFileReader``,
and ``ParquetFileSeq``.
- Persist GCP credentials in `ParquetFileReader` so that authentication is not repeated when unnecessary.

0.7.8

Added

- Expose `make_parquet_schema`, `make_parquet_field`, `make_parquet_type` as public API on the `biglist` package level.

Changed

- `ParquetSerializer.serialize` returns a file-like object.

0.7.7

Removed

- Removed many deprecated methods.
- `Biglist.register_storage_format` lost parameter `overwrite`, that is, it no longer allows changing
the definition of a "built-in" format.

Deprecated

- Deprecated "concurrent_iter" methods from `FileSeq` and "multiplexer" methods from `Biglist`.
- Deprecated function `write_parquet_file` (use `write_arrays_to_parquet` instead).

Added

- `BiglistBase.new` gets a new parameter `init_info`.
- `Biglist.new` accepts extra parameters for (de)serialization.
- New class `Multiplexer`.
- Allow schema spec when 'storage_format' is 'parquet' for `Biglist`.
- Function `write_parquet_file` was renamed `write_arrays_to_parquet`; added new function `write_pylist_to_parquet`.
- Added orjson serializers, preparing for their removal from `upathlib`.

0.7.6

- Fix a bug introduced in 0.7.5 in `Biglist.__init__` backcompat fix.

0.7.5

- Fix a bug in 0.7.4 about `Biglist.info['data_files_info']`.
- `arrow` became a mandatory (rather than optional) dependency, hence Parquet-related functionalities are available in regular install.

0.7.4

This release contains a large refactor, creating classes `Seq` and `FileSeq` and using them in many places
in the code.

`BiglistBase` gets a new method `files`, returning a `FileSeq`.
The functions related to iterating over `FileReader`s (sequentially or concurrently) are moved to
`FileSeq`. Many related methods in `BiglistBase` are deprecated.

There are other deprecations and renamings, for example,

- Class renamings: `ListView` -> `Slicer`; `ChainedList` -> `Chain`.
- Deprecated the parameter `thread_pool_executor` to `__init__` and `__new__`.

The new class `Seq` and the renamed classes `Chain` and `Slicer` are in the module `_util`.

Breaking changes

Previously, as new data items are `append`ed to a `Biglist`,
data items that are not yet `flush`ed, i.e. not persisted, hence only in memory buffer,
are immediately included in item access (by `__getitem__`), iteration (by `__iter__`),
and counted in the length of the Biglist. Now these elements are not included in these operations.

Removed

- `BiglistBase.{resolve_path, lockfile}`. These methods are replaced by direct calls to functions from `upathlib`.
- Parameter `require_exists` to `BiglistBase.__init__`.

Added

- `Biglist` gets a new `storage_format`--'parquet'--for simple data structures.

Page 4 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.