Versioned-hdf5

Latest version: v2.0.2

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

2.0.2

Minor Changes

- Fixed regression which would cause a crash when invoking `resize()` with a tuple of
`numpy.int64` as argument instead of a tuple of ints, e.g. such as one constructed
from `h5py.Dataset.size`.

2.0.1

Minor Changes

- Fixed regression, introduced in v2.0.0, which would cause the chunk hash map to become
corrupted when calling `resize()` to shrink a dataset followed by `delete_versions()`.

2.0.0

Major Changes

- `stage_dataset` has been reimplemented from scratch. The new engine is
expected to be much faster in most cases. [Read more here](staged_changes).
- `__getitem__` on staged datasets used to never cache data when reading from
unmodified datasets (before the first call to `__setitem__` or `resize()`) and
used to cache the whole loaded area on modified datasets (where the user had
previously changed a single point anywhere within the same staged version).

This has now been changed to always use the libhdf5 cache. As such cache is very
small by default, users on slow disk backends may observe a slowdown in
read-update-write use cases that don't overwrite whole chunks, e.g. `ds[::2] += 1`.
They should experiment with sizing the libhdf5 cache so that it's larger than the
work area, e.g.:

python
with h5py.File(path, "r+", rdcc_nbytes=2**30, rdcc_nslots=100_000) as f:
vf = VersionedHDF5File(f)
with vf.stage_version("r123") as sv:
sv["some_ds"][::2] += 1


(this recommendation applies to plain h5py datasets too).

Note that this change exclusively impacts `stage_dataset`; `current_version`,
`get_version_by_name`, and `get_version_by_timestamp` are not impacted and
continue not to cache anything regardless of libhdf5 cache size.
- Added support for Ellipsis (...) in indexing.

1.8.2

Major Changes

- Integer array and boolean array indices are transparently converted to slices when
possible, either globally or locally to each chunk.
This can result in major speedups.
- Monotonic ascending integer array indices have been sped up from O(n^2) to O(n*logn)
(where n is the number of chunks along the indexed axis).

Minor Changes

- `as_subchunk_map` has been reimplemented in Cython, providing a speedup
- Improved the exceptions raised by `create_dataset`
- Fixed a libhdf5 resource leak in `build_data_dict`;
the function has also been sped up.
- Slightly sped up hashing algorithm

1.8.0

Major Changes

- `slicetools` has been reimplemented in Cython, providing a significant speedup
- Only sdist will be published from here on out due to the dependency on MPI.
- Improved read/write performance for `InMemoryDataset`

Minor Changes

- Force the master branch to be targeted when building docs
- `__version__` dunder added back in
- Update build workflows to test with `numpy==1.24` in addition to `numpy>=2`
- Chunk reuse verification fixed for string dtype arrays
- Cleaned up `pytest` configuration; added additional debugging output in test CI job
- Fixed a bug where `InMemoryGroup` child groups were not closed when the parent
group is closed
- Nondefault compression handling is now supported
- Performance improvements to Hashtable initialization
- Various refinements to the documentation

1.7.0

Major Changes

- Added a new `VersionedHDF5File.get_diff` method
- Added a new `VersionedHDF5File.versions` property
- Updates to the build system to use `meson-python`
- Added numpy 2.0 support
- Make the `InMemoryGroup` repr more informative

Minor Changes

- Optimizations to `_recreate_raw_dataset`, `InMemoryDataset.resize`
- Added an optional check for verifying that reused chunks contain the expected
data. Can be turned on by setting the environment variable:
`ENABLE_CHUNK_REUSE_VALIDATION = 1`
- The documentation for all published versions is now available!
- Various DevEx improvements: `pre-commit`, `pygrep-hooks`-fixes, and tests now
will not produce unwanted artifacts
- Dataset names are now checked against a blocklist to avoid colliding with
reserved words

Page 1 of 6

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.