Dask-deltatable

Latest version: v0.3.2

Safety actively analyzes 688386 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.3.3

Fix imports to work with deltalake 0.20 https://github.com/dask-contrib/dask-deltatable/pull/84

0.3.2

Bug fix
Fix the arguments order in method to_deltalake called in the example (77)
Fix up some mypy errors (76)
Sort filenames (71)
Fix `mypy` error (62)

Project hygiene
Synchronise with dask-expr, newer Dask and newer deltalake (69)
Support auto-setting AWS credentials for storage options (78)
Compatibility with latest dask, pyarrow and deltalake (68)
Add path to tokenization (67)
Clarify readme for reading in deltalake (66)
Add conda installation instructions to README (6)
Add URL to setuptools metadata (60)

0.3.1post1

This version contains a patch that fixes a problem when reading datasets on a distributed cluster.

0.3

New Features and Enhancements
- More efficient Dask Graph generation (https://github.com/dask-contrib/dask-deltatable/pull/24)
- Transactional write support for append-only write operations with `to_deltalake` (https://github.com/dask-contrib/dask-deltatable/pull/29)
- Reader now supports partition pruning to only load files that match the provided filters (https://github.com/dask-contrib/dask-deltatable/pull/30)
- DAT reader acceptance testing against spark generated data (https://github.com/dask-contrib/dask-deltatable/pull/47)

Breaking changes

- Removed `vaccum_table` (https://github.com/dask-contrib/dask-deltatable/issues/16) and `history` (https://github.com/dask-contrib/dask-deltatable/issues/17) commands. Instead, please use native `delta-rs` functionality, see https://delta-io.github.io/delta-rs/python/usage.html#vacuuming-tables and https://delta-io.github.io/delta-rs/python/usage.html#history
- Minimal supported python version is now 3.9
- Renamed `read_delta_table` to `read_deltatable`

0.2alpha

This release builds a wrapper around the Rust package called `delta-rs` and uses dask for parallel reading.

Features:
1. Reads the parquet files based on delta logs parallelly using the dask engine
2. Supports all three filesystems like s3, azurefs, gcsfs
3. Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
4. Query Delta commit info - History
5. vacuum the old/ unused parquet files
6. load different versions of data using DateTime.

0.1.2alpha

DeltaTable reader using Dask

1. Reads delta table parallelly using dask
2. As an Ability to read from different filesystems like S3, Azurefs, gcsfs.
3. Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters like row and partition filters.

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.