Daft

Latest version: v0.4.9

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 14

0.2.5

Changes

๐Ÿ‘พ Bug Fixes

- [BUG] Check queue state while waiting to place inside samster25 (1678)
- [BUG] Parametrize dataframe unit-tests with Parquet data jaychia (1610)

๐Ÿงฐ Maintenance

- [CHORE] Favor traversal over visitors samster25 (1677)
- [CHORE] Bring in TreeNode and Refactor Expression Traversal to use TreeNode samster25 (1676)

โฌ†๏ธ Dependencies

- Bump indexmap from 2.0.2 to 2.1.0 dependabot (1669)

0.2.4

Changes

โœจ New Features

- [FEAT] show number of truncated columns samster25 (1673)
- [FEAT] add retries to s3 credential provider timeouts samster25 (1663)
- [FEAT] Dynamic Responsive Printing of Tables, Schema and Series samster25 (1662)
- [FEAT] Print the results of a df.show() to stdout if running in non-interactive mode jaychia (1655)
- [FEAT] 1606 - Adding hour expression in date util suriya-ganesh (1637)
- [FEAT] [CSV Reader] Bulk CSV reader + general CSV reader refactor clarkzinzow (1614)
- [FEAT] Use cached preview from `df.collect()` in `df.show()`. clarkzinzow (1651)

๐Ÿš€ Performance Improvements

- [PERF] Remove calls to `remote_len_partition` jaychia (1660)

๐Ÿ‘พ Bug Fixes

- [BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics jaychia (1632)
- [BUG] favor char indices instead of slicing to deal with unicode samster25 (1664)
- [BUG] pass in pyarrow dtype manually into parquet read samster25 (1650)
- [CHORE] Fixed bug in ray version dioptre (1649)

๐Ÿงฐ Maintenance

- [CHORE] pin pandas for 3.8 samster25 (1661)
- [CHORE] pin ray to 2.7.1 if less than 3.8 samster25 (1657)
- [CHORE] enable refresh on tqdm total updates samster25 (1654)

โฌ†๏ธ Dependencies

<details>
<summary>8 changes</summary>

- Bump chrono-tz from 0.8.3 to 0.8.4 dependabot (1670)
- Bump pytest from 7.4.1 to 7.4.3 dependabot (1644)
- Bump pandas from 2.0.3 to 2.1.3 dependabot (1643)
- Bump azure-storage-blob from 12.17.0 to 12.19.0 dependabot (1645)
- Bump async-compression from 0.4.4 to 0.4.5 dependabot (1638)
- Bump serde\_json from 1.0.107 to 1.0.108 dependabot (1639)
- Bump base64 from 0.21.4 to 0.21.5 dependabot (1640)
- Bump dyn-clone from 1.0.14 to 1.0.16 dependabot (1642)
</details>

0.2.3

Changes

โœจ New Features

- Enabling quote, comment and escape character suriya-ganesh (1582)
- [FEAT] Iceberg Scan Operator samster25 (1561)
- [FEAT] Enable Progress Bars for PyRunner and RayRunner samster25 (1609)

๐Ÿ‘พ Bug Fixes

- [BUG] Fix CSV roundtrip for decimals (actually an f64->decimal casting bug) jaychia (1626)
- [BUG] Filter out size-0 directory marker files during s3 globs jaychia (1629)
- [BUG] raise error if non valid parquet file (less than parquet footer size) samster25 (1628)
- [BUG] Fix parquet timestamp tz roundtrip inference jaychia (1625)
- [BUG] Roundtrip tests for CSVs and Parquet jaychia (1616)
- [BUG] Self-concat breaks with the RayRunner jaychia (1617)
- [BUG] Add better handling for case where glob of parquet files returns empty jaychia (1615)
- [BUG] enable fixed size binary ingest to daft binary samster25 (1612)
- [BUG] Manually specify region in tutorial read\_json jaychia (1608)
- [BUG] remove f strings from logging samster25 (1611)

๐Ÿ“– Documentation

- [BUG] Manually specify region in tutorial read\_json jaychia (1608)

๐Ÿงฐ Maintenance

- [CHORE] Fix style lints from 1582 jaychia (1635)
- [CHORE] add ray client to deps samster25 (1631)
- [CHORE] update fsspecs (s3, gcs, aldfs) in lockstep samster25 (1620)
- [CHORE] update azure storage blobs to 0.17.0 samster25 (1622)
- [CHORE] delete old rule runners samster25 (1619)
- [CHORE] drop ray default dep to make room for Pydantic > 2.0 samster25 (1618)

โฌ†๏ธ Dependencies

- Bump moonrepo/setup-rust from 0 to 1 dependabot (1237)
- Bump google-cloud-storage from 0.13.1 to 0.14.0 dependabot (1549)
- Bump async-compat from 0.2.2 to 0.2.3 dependabot (1567)

0.2.2

Changes

- [CHORE] Edit 'make-hooks' command to install pre-commit script colin-ho (1602)
- [CHORE] Improve error messages when calling aggregation methods on dataframe without input columns colin-ho (1587)

โœจ New Features

- [FEAT] Add translation of IOConfig to PyArrow filesystem arguments jaychia (1592)
- [FEAT] [Scan Operator] Refactor planning and execution code to use shared `Pushdowns` struct. clarkzinzow (1595)
- [FEAT] [Scan Operator] Add `ChunkSpec` for specifying format-specific per-file row subset selection for `ScanTask`s. clarkzinzow (1590)
- [FEAT] [Scan Operator] Integrate `size_bytes` with `ScanOperator`s clarkzinzow (1586)
- [FEAT] [Scan Operator] Add Python I/O support (+ JSON) to `MicroPartition` reads clarkzinzow (1578)
- [FEAT][ScanOperator 1/3] Add MVP e2e `ScanOperator` integration. clarkzinzow (1559)

๐Ÿš€ Performance Improvements

- [PERF][REVERT] Reverts: use pyarrow table for pickling rather than ChunkedArray (1488) jaychia (1605)
- [PERF] Speed Up MicroPartition Ops when we know the result is empty samster25 (1604)

๐Ÿ‘พ Bug Fixes

- [BUG] clean up ray scheduler threads after computing partial results samster25 (1597)
- [BUG] Update requirements for typing\_extensions jaychia (1593)
- [BUG] Fix Deadlock with ScanOperators in `to_physical_plan_scheduler` and show iostats for glob and from\_scan\_task samster25 (1581)
- [BUG] add allow threads for io pool operations samster25 (1580)

๐Ÿงฐ Maintenance

- [CHORE] delete unused wheel tools samster25 (1603)
- [CHORE] add IOStats to all micropartition ops samster25 (1584)
- [CHORE] Use DAFT\_MICROPARTITIONS as shared feature flag for data catalog support jaychia (1579)
- [CHORE] Convert GlobScanOperator to perform streaming into result and take a list of glob paths jaychia (1577)

โฌ†๏ธ Dependencies

- Bump numpy from 1.25.2 to 1.26.2 dependabot (1596)

0.2.1

Changes

- [FEAT] Support disabling using doubled quotes to escape in CSV ravern (1544)
- [DOCS]: fix typo in doc amir-f (1534)

โœจ New Features

- [FEAT] GlobScanOperator jaychia (1550)
- [FEAT] [New Query Planner] [2/N] Push partition spec into physical plan, remove Coalesce logical op. clarkzinzow (1540)

๐Ÿ‘พ Bug Fixes

- [BUG] Fix reads of empty parquet files jaychia (1555)
- [BUG] Bump Parquet reader max\_page\_size to 256MB jaychia (1553)
- [BUG] add sort after running passes samster25 (1545)
- [BUG] Fix credentials issues in colab/CI jaychia (1539)

๐Ÿ“– Documentation

- [BUG] Fix credentials issues in colab/CI jaychia (1539)

๐Ÿงฐ Maintenance

- [CHORE] Fix bad merge conflict in GlobScanOperator wrt CSV schema inference jaychia (1556)
- [CHORE] Revert "Bump pandas from 2.0.3 to 2.1.2" jaychia (1554)
- [CHORE] [New Query Planner] [1/N] Remove Python query planner. clarkzinzow (1538)
- [CHORE] changes to partition field and field creation samster25 (1537)
- [CHORE] Move code from daft-csv to daft-decoding jaychia (1533)

โฌ†๏ธ Dependencies

<details>
<summary>6 changes</summary>

- Bump pandas from 2.0.3 to 2.1.2 dependabot (1542)
- Bump tempfile from 3.8.0 to 3.8.1 dependabot (1548)
- Bump opencv-python from 4.8.0.76 to 4.8.1.78 dependabot (1546)
- Bump aws-actions/configure-aws-credentials from 3 to 4 dependabot (1384)
- Bump async-trait from 0.1.71 to 0.1.74 dependabot (1496)
- Bump serde from 1.0.188 to 1.0.190 dependabot (1541)
</details>

0.2

We're proud to release version 0.3.0 of Daft! Please note that with this minor version increment, v0.3 contains several breaking changes:
- `daft.read_delta_lake`
- This function was deprecated in favor of `daft.read_deltalake` in v0.2.26 and is now removed. (2663)
- `daft.read_parquet` / `daft.read_csv` / `daft.read_json`
- Schema hints are deprecated in favor of `infer_schema` (whether to turn on schema inference) and `schema` (a definitive schema if infer_schema is False, otherwise it is used as a schema hint that is applied post inference). (2326)
- `Expression.str.normalize()`
- Parameters are now all False by default, and need to individually be toggled on. (2647)
- `DataFrame.agg` / `GroupedDataFrame.agg`
- Tuple syntax for aggregations was deprecated in v0.2.18 and is now no longer supported. Please use aggregation expressions instead. (2663)
- Ex: `df.agg([(col("x"), "sum"), (col("y"), "mean")])` should be written instead as `df.agg(col("x").sum(), col("y").mean())`
- `DataFrame.count`
- Calling `.count()` with no arguments will now return a DataFrame with column โ€œcountโ€ which contains the length of the entire DataFrame, instead of the count for each of the columns (1996)
- `DataFrame.with_column`
- Resource requests should now be specified on UDF expressions (`udf(num_gpus=โ€ฆ)`) instead of on Projections (through `.with_column(..., resource_request=...)` (2654)
- `DataFrame.join`
- When joining two DataFrames, columns will now be merged only if they exactly match join keys. (2631)
- Ex:

python
df1 = daft.from_pydict({
"a": ["x", "y"],
"b": [1, 2]
})

df2 = daft.from_pydict({
"a": ["y", "z"],
"b": [20, 30]
})

result_df = df1.join(
df2,
left_on=[col("a"), col("b")],
right_on=[col("a"), col("b")/10], NOTE THE "/10"
how="outer"
)

result_df.sort("a").collect()



before
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ a โ”† b โ”‚
โ”‚ --- โ”† --- โ”‚
โ”‚ Utf8 โ”† Int64 โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ x โ”† 1 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ y โ”† 2 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ z โ”† 30 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

after
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ a โ”† b โ”† right.b โ”‚
โ”‚ --- โ”† --- โ”† --- โ”‚
โ”‚ Utf8 โ”† Int64 โ”† Int64 โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ x โ”† 1 โ”† None โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ y โ”† 2 โ”† 20 โ”‚
โ”œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ผโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ•Œโ”ค
โ”‚ z โ”† None โ”† 30 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ



Changes

โœจ New Features

- [FEAT] Ellipsize scan task sources if too many Vince7778 (2695)
- [FEAT] Allow user provided schema and schema inference length for read\_sql colin-ho (2676)
- [FEAT] Add dataframe iteration on rows and change default buffer size jaychia (2685)
- [FEAT]: add to\_arrow\_iter universalmind303 (2681)
- [FEAT] Example Analyze for Local Execution Engine samster25 (2648)
- [FEAT] (ACTORS-1) Add DAFT\_ENABLE\_ACTOR\_POOL\_PROJECTS=1 feature flag and specifying concurrency jaychia (2668)
- [FEAT]: sql like \& ilike universalmind303 (2666)
- [FEAT] Changes the default count() behavior to perform a global row count instead jaychia (2653)
- [FEAT] Support passing in column name strings to `to_struct` Vince7778 (2671)
- [FEAT]: refactor tree display to get more info into physicalplan universalmind303 (2640)
- [FEAT] Add `to_struct` function for merging columns into a struct Vince7778 (2662)
- [FEAT] Add hashing and groupby on structs Vince7778 (2657)
- [FEAT]: `daft.sql_expr` universalmind303 (2656)
- [FEAT] Deprecates usage of resource\_request on df.with\_column API jaychia (2654)
- [FEAT] Add input batching for UDFs Vince7778 (2651)
- [FEAT] Add `cbrt` expression raunakab (2646)
- [FEAT] use ObfuscatedString to hide creds when Display IOConfig samster25 (2645)
- [FEAT]: more sql functions universalmind303 (2596)
- [FEAT] Support \_\_init\_\_ arguments for StatefulUDFs jaychia (2634)
- [FEAT] Move resource requests to UDFs instead of on with\_column jaychia (2632)
- [FEAT] Add wildcards in column expressions Vince7778 (2629)
- [FEAT] factor mermaid builder into it's own module to use independently samster25 (2636)
- [FEAT] Remote parquet streaming colin-ho (2620)
- [FEAT]: mermaid formatter universalmind303 (2619)
- [FEAT] Add ActorPoolProject logical and physical plans jaychia (2601)
- [FEAT] Enable broadcast strategy on anti and semi joins kevinzwang (2621)
- [FEAT] Add `.list.sort()` for sorting lists within a list column Vince7778 (2589)
- [FEAT] Streaming Local Parquet Reads colin-ho (2592)

๐Ÿš€ Performance Improvements

- [PERF] Add ability to automatically choose broadcast for anti/semi joins kevinzwang (2699)
- [PERF] Swordfish Dynamic Pipelines samster25 (2599)
- [PERF] Dyn Compare + Probe Table samster25 (2618)

๐Ÿ‘พ Bug Fixes

- [BUG] Fix Parquet reads with chunk sizing desmondcheongzx (2658)
- [BUG]: repr mermaid fix universalmind303 (2688)
- [BUG] Use Daft Pickle instead of Ray Pickle and use bincode for serializing samster25 (2693)
- [BUG] Add timeout to analytics client raunakab (2670)
- [BUG] Fix swordfish inner joins colin-ho (2678)
- [BUG] Fix struct `.hash()` naming bug Vince7778 (2673)
- [BUG] Fix filter pushdown into non-inner joins kevinzwang (2659)
- [BUG] Fix issues where we check "is\_ray\_runner" on non-initialized contexts jaychia (2652)
- [BUG] Fix nested parquet reads for .show() and .limit() desmondcheongzx (2643)
- [BUG] Fix join op names and join key definition kevinzwang (2631)
- [BUG] Fix projection pushdowns not working with limits Vince7778 (2635)
- [BUG] Fix Expr::with\_new\_children for ScalarFunction kevinzwang (2624)
- [BUG] Fix pushdown past monotonically increasing id Vince7778 (2622)

๐Ÿ“– Documentation

- [CHORE] Fix FOTW 001 images notebook jaychia (2697)
- [DOCS] Add join types, renaming behavior, and example to join docs kevinzwang (2691)
- [FEAT] Add dataframe iteration on rows and change default buffer size jaychia (2685)
- [DOCS]: add docs for cosine\_distance universalmind303 (2675)
- [FEAT] Add `to_struct` function for merging columns into a struct Vince7778 (2662)
- [CHORE] Turn v0.3 deprecations into breaking changes kevinzwang (2663)
- [FEAT] Add `cbrt` expression raunakab (2646)
- [FEAT] Support \_\_init\_\_ arguments for StatefulUDFs jaychia (2634)
- [FEAT] Move resource requests to UDFs instead of on with\_column jaychia (2632)
- [FEAT] Add wildcards in column expressions Vince7778 (2629)
- [DOCS] Enable doc tests in CI colin-ho (2615)
- [FEAT] Add `.list.sort()` for sorting lists within a list column Vince7778 (2589)
- docs: Add fotw tutorial on working with images avriiil (2490)

๐Ÿงฐ Maintenance

- [CHORE] fix merge conflict in repr tests samster25 (2700)
- [CHORE] Fix FOTW 001 images notebook jaychia (2697)
- [CHORE] Deprecate schema hints colin-ho (2655)
- [CHORE] Add error snafus for local executor colin-ho (2660)
- [FEAT]: refactor tree display to get more info into physicalplan universalmind303 (2640)
- [CHORE] Turn v0.3 deprecations into breaking changes kevinzwang (2663)
- [CHORE]: Drop use of deprecated form "default\_features" universalmind303 (2665)
- [CHORE] bump dev version to 0.3.0 samster25 (2664)
- [CHORE]: fix feature flags universalmind303 (2661)
- [CHORE] Set `Expression.str.normalize()` options to False by default Vince7778 (2647)
- [CHORE] Improve swordfish error handling colin-ho (2628)
- [CHORE] Add ignore for helix editor raunakab (2642)
- [CHORE] Add toolchain check to Makefile Vince7778 (2641)
- [CHORE] Upgrade Rust toolchain to 2024-08-01 Vince7778 (2639)
- [CHORE] Track memory for swordfish tpch colin-ho (2633)
- [CHORE] Split resource-request and hashable-float-wrapper into utility crates jaychia (2630)
- [CHORE] Use parquet for native tpch benchmarks colin-ho (2609)
- [CHORE] Refactor UDFs to separate stateful and stateless jaychia (2597)

Page 10 of 14

ยฉ 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.