Polars

Latest version: v0.20.23

Safety actively analyzes 623106 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.7.0

๐Ÿ› ๏ธ Other improvements

- Pin PyPI release action (59)
- Bump toolchain (58)
- Set version to 0.7.0 and update dependencies (57)

0.6.0

๐Ÿ› ๏ธ Other improvements

- Set version to 0.6.0 and update dependencies (50)
- Build `manylinux_2_28` wheels (42)

0.5.1

๐Ÿ› ๏ธ Other improvements

- Set version to 0.5.1 and update dependencies (41)
- Add option to bump patch version (40)
- Fix GitHub release commit (37)

0.5.0

๐Ÿ› ๏ธ Other improvements

- Set version to 0.5.0 and update dependencies (36)
- Use app token (33)
- Update workflow for bumping versions (32)
- Update README (31)
- Add release to PyPI (30)
- Allow installation through `pip` (29)
- Fix release upload (28)
- Fix GH publish (27)

0.4.0

๐Ÿ› ๏ธ Other improvements

- Bump reedline from 0.22.0 to 0.24.0 (18)
- Bump sqlparser from 0.36.1 to 0.38.0 (19)
- Bump clap from 4.3.21 to 4.4.2 (8)
- Bump package version to `0.4.0` (26)
- Add release workflow (25)
- Extend gitignore (24)
- Pin polars to version rather than git main (23)
- Bump reedline from 0.23.0 to 0.24.0 (18)
- Bump sqlparser from 0.37.0 to 0.38.0 (19)
- Bump toolchain (22)
- Bump polars from `c4ac859` to `d3e320e` (7)
- Bump reedline from 0.22.0 to 0.23.0 (10)
- Bump sqlparser from 0.36.1 to 0.37.0 (9)
- Bump clap from 4.3.21 to 4.4.2 (8)
- Expand installation section in README (4)
- Add version bump workflow (5)
- Unfix reedline (3)
- Change version help text when starting the CLI (2)
- Add CI pipeline (1)


py-0.20.23
๐Ÿš€ Performance improvements

- Don't rechunk in parallel collection (15907)
- Improve non-trivial list aggregations (15888)
- Ensure we hit specialized gather for binary/strings (15886)
- Limit the cache size for `to_datetime` (15826)
- skip initial null items and don't recompute `slope` in `interpolate` (15819)

โœจ Enhancements

- don't require pyarrow for converting pandas to Polars if all columns have simple numpy-backed datatypes (15933)
- Add option to disable globbing in csv (15930)
- Add option to disable globbing in parquet (15928)
- Expressify `dt.round` (15861)
- Improve error messages in context stack (15881)
- Add dynamic literals to ensure schema correctness (15832)
- Add a low-friction `sql` method for DataFrame and LazyFrame (15783)
- add timestamp time travel in delta scan/read (15813)

๐Ÿž Bug fixes

- Set default limit for String column display to 30 and fix edge cases (15934)
- Change recognition of numba ufunc (15916)
- series.search\_sorted could support more types of input (15940)
- Remove ffspec from parquet reader (15927)
- avoid WRITE+EXEC for CPUID check (15912)
- fix inconsistent decimal formatting (15457)
- Preserve NULLs for `is_not_nan` (15889)
- double projection check should only take the upstream projections into account (15901)
- Ensure we don't create invalid frames when combining unit lit + โ€ฆ (15903)
- Clear cached rename schema (15902)
- Fix OOB in struct lit/agg aggregation (15891)
- Refine interaction of "schema\_overrides" with `read_excel` when using "calamine" engine (15827)
- Don't modify user-supplied `storage_options` dict (take a shallow-copy) (15859)
- create (q)cut labels in fixed order (15843)
- Tag `shrink_dtype` as non-streaming (15828)

๐Ÿ“– Documentation

- improve graphviz install documentation/error message (15791)
- Extend docstring examples for asof\_join (15810)

๐Ÿ“ฆ Build system

- Don't import jemalloc (15942)
- Use default allocator for lts-cpu (15941)
- replace all macos-latest referrals with macos-13 (15926)
- pin mimalloc and macos-13 (15925)
- use jemalloc in lts-cpu (15913)
- Update Cargo.lock (15865)
- Bump `ruff` version and improve `make clean` on the Python side (15858)
- Exclude `rust-toolchain.toml` from wheels (15840)

๐Ÿ› ๏ธ Other improvements

- Replace copy/paste import handling with `import_optional` utility function (15906)
- Reorganize from\_iter and dispatch to collect\_ca when possible (15904)
- More PyO3 0.21 bound APIs (15872)
- Improve type-coercion (15879)
- Move type coercion to IR conversion phase (15868)
- Fix Python test coverage upload (15853)
- More upgrades to PyO3 0.21 Bound\<> APIs (15790)
- Use uv for installing Python dependencies in CI (15848)
- Update benchmark tests (15825)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, NedJWestern, NexVeridian, alexander-beedie, deanm0000, dependabot, dependabot[bot], ion-elgreco, itamarst, jr200, nameexhaustion, orlp, reswqa, ritchie46 and stinodego


py-0.20.22
๐Ÿš€ Performance improvements

- Improved type-inference for `read_excel` and `read_ods`, use calamine engine for `read_ods` (15808)
- Fix quadratic in binview growable same source (15734)
- use two binary searches for equality mask when data is sorted (15702)
- improve filter parallelism (15686)

โœจ Enhancements

- Minor type-inference update for `read_database` (15809)
- Improved type-inference for `read_excel` and `read_ods`, use calamine engine for `read_ods` (15808)
- `dt.truncate` supports broadcasting lhs (15768)
- Expressify `str.json_path_match` (15764)
- raise if `storage_options` is passed to read\_csv but `fsspec` isnt available (15778)
- Support decimal float parsing in CSV (15774)
- Add context trace to `LazyFrame` conversion errors (15761)
- Improve error message when passing invalid input to `lit` (15718)
- Remove outdated join validation checks (15701)

๐Ÿž Bug fixes

- drop-nulls edge case; remove drop-nulls special case (15815)
- ewm\_mean\_by was skipping initial nulls when it was already sorted by "by" column (15812)
- Consult cgroups to determine free memory (15798)
- raise if index count like 2i is used when performing rolling, group\_by\_dynamic, upsample, or other temporal operatios (15751)
- Don't deduplicate sort that has slice pushdown (15784)
- Allow passing files opened by fsspec in `read_parquet` (15770)
- Fix incorrect `is_between` pushdown to `scan_pyarrow_dataset` (15769)
- Handle null index correctly for list take (15737)
- Preserve lexical ordering on concat (15753)
- Remove incorrect unsafe pointer cast for int -> enum (15740)
- pass series name to apply for cut/qcut (15715)
- count of null column shouldn't panic in agg context (15710)
- manual cache (15711)
- Ensure we don't hold onto Mutex when grabbing join tuples (15704)
- allow null dtypes in UDFs if they match the schema (15699)
- Respect join\_null argument for semi/anti joins (15696)
- Ensure we don't hold RwLock when spawning group parallelism in wโ€ฆ (15697)
- Ensure empty with\_columns is a no-op (15694)
- Include predicate in cache state union (15693)
- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ“– Documentation

- Add docstring examples for datetimes (13161) (15804)
- Fix a typo in categorical section of the user guide (15777)
- Fix a docstring mistake for DataType.is\_float (15773)
- Remove incorrect "1i (1 index count)" from some docs methods (15750)
- Add example for `Config.set_tbl_width_chars` (15566)
- Align docstring phrasing in `Series/Expr.dt.truncate/round` (15698)
- Various deprecation docstring improvements (15648)

๐Ÿ› ๏ธ Other improvements

- Always expand horizontal\_any/all (15816)
- Rename decimal\_float to decimal\_comma (15817)
- Split coverage calculation (15780)
- Update readme (15787)
- Start at using new Bound\<> API from PyO3 (15752)
- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
MarcoGorelli, NedJWestern, Robinsane, TobiasDummschat, alexander-beedie, c-peters, dependabot, dependabot[bot], gasmith, henryharbeck, itamarst, kszlim, mbuhidar, nameexhaustion, orlp, reswqa, ritchie46, stinodego and wsyxbcl


rs-0.39.2
๐Ÿš€ Performance improvements

- use two binary searches for equality mask when data is sorted (15702)
- improve filter parallelism (15686)

โœจ Enhancements

- Remove outdated join validation checks (15701)

๐Ÿž Bug fixes

- manual cache (15711)
- Ensure we don't hold onto Mutex when grabbing join tuples (15704)
- allow null dtypes in UDFs if they match the schema (15699)
- Respect join\_null argument for semi/anti joins (15696)
- Ensure we don't hold RwLock when spawning group parallelism in wโ€ฆ (15697)
- Ensure empty with\_columns is a no-op (15694)
- Include predicate in cache state union (15693)
- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ› ๏ธ Other improvements

- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
henryharbeck, kszlim, orlp, reswqa and ritchie46


py-0.20.22-rc.1
๐Ÿš€ Performance improvements

- improve filter parallelism (15686)

๐Ÿž Bug fixes

- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ“– Documentation

- Various deprecation docstring improvements (15648)

๐Ÿ› ๏ธ Other improvements

- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
henryharbeck, reswqa and ritchie46


rs-0.39.1
๐Ÿš€ Performance improvements

- Fix regression that led to using only a single thread (15667)

โœจ Enhancements

- add ewm\_mean\_by (15638)

๐Ÿž Bug fixes

- Ensure profile of simple-projection only take own runtime (15671)
- Panic if invalid array in object (15664)
- Ensure 'CachedSchema' doesn't get synced between plans (15661)
- `group_by` multiple null columns produce phantom row (15659)
- rolling\_\* aggs were behaving as if they return scalars in group-by (15657)
- Correct the unsoundness slice range of `arr.min/max` (15654)
- `list.mean` fast path shouldn't produce NaN (15652)
- Fix Display implementation of Duration (15647)

๐Ÿ“– Documentation

- Fix typo in legacy install instructions (15662)
- Include prelude import in the example (15633)

๐Ÿ› ๏ธ Other improvements

- Remove the remaining usage of deprecated `numpy` crate APIs (15668)
- make Duration.is\_constant\_duration less strict for non-timezone-aware case (15639)
- Fix some typos in comments (15665)
- remove unnecessary unsafe in list mean/sum (15660)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Priyansh4444, StevenMia, itamarst, mcrumiller, orlp, reswqa, ritchie46 and stinodego


py-0.20.21
๐Ÿš€ Performance improvements

- Fix regression that led to using only a single thread (15667)

โœจ Enhancements

- add ewm\_mean\_by (15638)

๐Ÿž Bug fixes

- Ensure profile of simple-projection only take own runtime (15671)
- Panic if invalid array in object (15664)
- Ensure 'CachedSchema' doesn't get synced between plans (15661)
- `group_by` multiple null columns produce phantom row (15659)
- rolling\_\* aggs were behaving as if they return scalars in group-by (15657)
- Correct the unsoundness slice range of `arr.min/max` (15654)
- `list.mean` fast path shouldn't produce NaN (15652)

๐Ÿ“– Documentation

- Add missing deprecation warning to `DataFrame.replace` (15612)
- Fix typo in legacy install instructions (15662)

๐Ÿ› ๏ธ Other improvements

- make Duration.is\_constant\_duration less strict for non-timezone-aware case (15639)
- Fix some typos in comments (15665)
- remove unnecessary unsafe in list mean/sum (15660)
- fixup failing test due to `offset` deprecation in `upsample` (15636)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Priyansh4444, StevenMia, eitsupi, itamarst, mcrumiller, orlp, reswqa, ritchie46 and stinodego


rs-0.39.0
๐Ÿ† Highlights

- Full plan CSE (15264)

๐Ÿ’ฅ Breaking changes

- rename `memmap` -> `memory_map` as like Python (15642)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- Update the argument name from `dims` to `dimensions` in `reshape` (15561)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Rename `Chunk` to `RecordBatch` (15298)
- Refactor AnyValue supertype logic (15280)
- Rename `group_by_rolling` to `rolling` and improve related error messages (14765)
- Rename `ChunkedArray.try_apply` to `try_apply_values` (14947)
- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)

๐Ÿš€ Performance improvements

- Fix cross join batch size when one of the DataFrames is tiny (14347)
- Fix binview growable complexity O(n\*m) -> O(n) (15628)
- Remove extra thread spawn from row group fetcher (15626)
- Use vertical parallelism if input is chunked for `Filter`,`Select`,`WithColumns` (15608)
- read\_ipc memory usage tests, and writing fix (15599)
- Refactor CSV serialization to not go thorough `AnyValue` (15576)
- don't use dynamic dispatch in visitors (15607)
- Improve Bitmap construction performance (15570)
- join by row-encoding (15559)
- Replace std::thread spawn with tokio block\_in\_place (15517)
- speed up offset\_by when a single offset is passed (15493)
- Avoid allocation in the hot path for struct JSON serialization (15449)
- avoid double-allocation in rolling\_apply\_agg\_window (15423)
- Make LogicalPlan immutable (15416)
- Add non-order preserving variable row-encoding (15414)
- Use row-encoding for multiple key group by (15392)
- load bits one word at a time for BitmapIter (15333)
- Ipc exec multiple paths (15040)
- add SIMD support for if-then-else kernels (15131)

โœจ Enhancements

- add Expr.dt.add\_business\_days and Series.dt.add\_business\_days (15595)
- Add `str.head` and `str.tail` (14425)
- Extended `BytecodeParser` to handle additional math functions, and imports from the global namespace (15627)
- Push down `is_between` expressions to Arrow (15180)
- add holidays argument to business\_day\_count (15580)
- change default to write parquet statistics (15597)
- Expressify `to_integer` (15604)
- Optimizer; remove double SORT and redundant projections (15573)
- Add `null_on_oob` parameter to `expr.array.get` (15426)
- support weekend argument in business\_day\_count (15544)
- Enable `is_first/last_distinct` for not nested non-numeric list (15552)
- Turn off cse if cache node found (15554)
- Tag concat list as elementwise (15545)
- Support list group-by of non numeric lists (15540)
- add business\_day\_count function (15512)
- Add SQL support for `MEDIAN` aggfunc (15519)
- Implement `string`, `boolean` and `binary` dtype in `top_k` (15488)
- Add SQL support for `TRUNCATE TABLE` command (15513)
- Add SQL support for `GREATEST` and `LEAST` (15511)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Implements `agg_list` for `NullChunked` (15439)
- Supports `explode_by_offsets` for decimal (15417)
- Add `null_on_oob` parameter to `expr.list.get` (15395)
- CSV-writer escape carriage return (15399)
- Remove 'FileCacher' optimization (15357)
- check input type in entropy (15351)
- Implements `arr.n_unique` (15296)
- CSE don't scan share if predicate pushdown predicates don't match (15328)
- Remove cached nodes when finished (15310)
- Full plan CSE (15264)
- Add IR for expressions. (15168)
- Warn if `map_elements` is called without `return_dtype` specified (15188)
- Rename `group_by_rolling` to `rolling` and improve related error messages (14765)
- Rename `ChunkedArray.try_apply` to `try_apply_values` (14947)
- Implement strict AnyValue construction for temporal types (15146)

๐Ÿž Bug fixes

- Return appropriate data type for time `mean` and `median` (14471)
- Support index upsampling (13621)
- Fix issue in `write_excel` that could lead to incorrect spanning range determination (15631)
- Output correct dtype for `mean_horizontal` on a single column (15118)
- Recompute RowIndex schema after projection pd (15625)
- Mean of boolean in streaming group\_by incorrectly always gave NULL (15616)
- Include cloud creds in cache key (15609)
- Fix elementwise-apply if any input is `AggregatedScalar` (15606)
- Explode list should take validity into account (15572)
- use larger recursive stack in debug mode (15593)
- SQL interface "off-by-one' indexing error with `GROUP BY` clauses that use position ordinals (15584)
- Enable missing features in polars-time (15558)
- Handle quoted identifiers when registering CTEs in the SQL engine (15564)
- Decompress moved out of schema initialization (15550)
- Turn off cse if cache node found (15554)
- Resolve function names and prune all aliases. (15522)
- `list.get` should take validity into account (15516)
- block decimal in streaming (15520)
- `group_by` partitioned with literal `Series` panic (15487)
- Initialize validity for `GroupsProxy::Slice` windows (15509)
- Fix struct name resolving (15507)
- `pow` return type evaluation (15506)
- Allow selectors inside frame-level `.filter()` (15445)
- Don't prune alias in AnonymousFunction subtree (15453)
- Fix deadlock in async parquet scan (15440)
- datetime operations (e.g. .dt.year) were raising when null values were backed by out-of-range integers (15420)
- Ensure Binary -> Binview cast doesn't overflow the buffer size (15408)
- Don't prune alias in function subtree (15406)
- Return 0 for `n_unique()` in group-by context when group is empty (15289)
- Unset UpdateGroups after group-sensitive expression (15400)
- `to_any_value` should supports all LiteralValue type (15387)
- Hash failure combining hash of two numeric columns containing equal values (15397)
- Add FixedSizeBinary to arrow field conversion (15389)
- Conversion of expr\_ir in partition fast path (15388)
- `sort` for series with unsupported dtype should raise instead of panic (15385)
- Return correct dtype for `s.clear()` when dtype is `Object` (15315)
- ensure first datapoint is always included in group\_by\_dynamic (15312)
- Non-exhaustive patterns: arrow-schema::DataType in polars-arrow (15250)
- use dynamic stacks for problematic recursive functions (15355)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Fix cache dot visualization (15311)
- Properly propagate `strict` flag when constructing a Struct Series from any values (15302)
- ensure `eq` for `BinaryViewArray` checks all elements (15268)
- Raise when join projects name with suffix that doesn't exist (15256)
- fix kurtosis/skew (15137)
- Ensure ooc\_start is set (15255)
- Fix bug where rolling operations were ignoring `check_sorted` in some cases (15227)
- Fix lazy schema for `rle` expression (15248)
- incorrect negative offset in multi-byte string slicing (15140)
- do not clamp negative offsets to start of array prematurely (15242)
- allow null index in list.get and array.get (15239)
- properly support nulls\_last + descending (15212)
- Block rounding/truncating to negative durations (15175)
- Make parse\_url work on windows with object\_store (15191)
- divide by zero in download speed computation (15182)

๐Ÿ“– Documentation

- Add legacy CPU install instructions in user guide (13676)
- Various minor updates to User Guide's SQL intro section (15557)
- Add `outer_coalesce` join strategy in the user guide (15405)
- Improve docs for `Series::new` with `AnyValue` input (15306)
- Fix formatting in `Series::from_any_values_and_dtype` docs (15244)
- Correct the definition of an expression in the user guide (14750)

๐Ÿ“ฆ Build system

- Fix a feature gate for `lz4` compression in `polars-parquet` (15565)
- Update Rust toolchain (15353)

๐Ÿ› ๏ธ Other improvements

- rename `memmap` -> `memory_map` as like Python (15642)
- fixup failing test due to `offset` deprecation in `upsample` (15636)
- use bound api (15630)
- Don't run streaming group-by in partitionable gb (15611)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- remove try\_binary\_elementwise\_values (15592)
- remove raw pointers from visitors. (15579)
- rename to IR (15571)
- Update the argument name from `dims` to `dimensions` in `reshape` (15561)
- Rename ALogicalPlan to FullAccessIR (15553)
- Set up CodSpeed (15537)
- make dsl immutable and cheap to clone (15394)
- use recursive crate, add missing recursive tag (15393)
- Update CODEOWNERS (polars-sql) (15384)
- Update Rust toolchain (15353)
- Update CODEOWNERS (15352)
- remove try\_apply\_values (15336)
- always use non-legacy float\_sum for mean (15343)
- remove legacy bitmap module (15335)
- More clippy in Makefile (15340)
- Rename `Cache[count]` to `Cache[cache_hits]` (15300)
- Cleanup file\_caching optimization call (15299)
- Rename `Chunk` to `RecordBatch` (15298)
- Refactor AnyValue supertype logic (15280)
- reuse message parsing in IPC (15265)
- remove 'fast-projection' node (15253)
- cleanup column names in optimizer (15252)
- remove left\_most\_input\_name from expr ir (15251)
- add AlignedBitmapSlice (15171)
- Refactor AnyValue construction for Categorical/Enum dtype (15220)
- Move ConsecutiveCountState into support module (15186)
- Run non-benchmark tests in benchmark workflow (15207)
- Add `wrapping_abs` to arithmetic kernel (15210)
- remove raw buffers from BinViewArray (15206)
- Enable `RUST_BACKTRACE=1` in the CI test suite (15204)
- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)
- Set dual license for `polars-arrow` and `polars-parquet` (15173)
- remove parts of legacy bit\_util (15169)
- remove legacy arrow compute (15164)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, Fokko, JamesCE2001, MarcoGorelli, NedJWestern, Sol-Hee, TrevorWinstral, alexander-beedie, braaannigan, c-peters, cmdlineluser, cojmeister, deanm0000, dependabot, dependabot[bot], douglas-raillard-arm, eitsupi, filabrazilska, henryharbeck, i-aki-y, itamarst, kszlim, leoforney, mbuhidar, mcrumiller, mickvangelderen, nameexhaustion, orlp, ozgrakkurt, petrosbar, reswqa, ritchie46, rob-sil, sportfloh, stinodego, thomaslin2020 and yutannihilation


py-0.20.20
๐Ÿš€ Performance improvements

- Fix cross join batch size when one of the DataFrames is tiny (14347)
- Fix binview growable complexity O(n\*m) -> O(n) (15628)
- Remove extra thread spawn from row group fetcher (15626)
- Use vertical parallelism if input is chunked for `Filter`,`Select`,`WithColumns` (15608)
- Refactor CSV serialization to not go thorough `AnyValue` (15576)
- don't use dynamic dispatch in visitors (15607)
- Improve Bitmap construction performance (15570)
- join by row-encoding (15559)

โœจ Enhancements

- add Expr.dt.add\_business\_days and Series.dt.add\_business\_days (15595)
- Add `str.head` and `str.tail` (14425)
- Add `union`/`or` operator for `pl.Enum` (14965)
- Extended `BytecodeParser` to handle additional math functions, and imports from the global namespace (15627)
- Push down `is_between` expressions to Arrow (15180)
- add holidays argument to business\_day\_count (15580)
- change default to write parquet statistics (15597)
- Expressify `to_integer` (15604)
- Optimizer; remove double SORT and redundant projections (15573)
- Add `null_on_oob` parameter to `expr.array.get` (15426)
- support weekend argument in business\_day\_count (15544)
- Enable `is_first/last_distinct` for not nested non-numeric list (15552)
- Turn off cse if cache node found (15554)
- Tag concat list as elementwise (15545)

๐Ÿž Bug fixes

- Return appropriate data type for time `mean` and `median` (14471)
- Fix issue in `write_excel` that could lead to incorrect spanning range determination (15631)
- Output correct dtype for `mean_horizontal` on a single column (15118)
- Recompute RowIndex schema after projection pd (15625)
- Mean of boolean in streaming group\_by incorrectly always gave NULL (15616)
- Include cloud creds in cache key (15609)
- Fix elementwise-apply if any input is `AggregatedScalar` (15606)
- Explode list should take validity into account (15572)
- use larger recursive stack in debug mode (15593)
- SQL interface "off-by-one' indexing error with `GROUP BY` clauses that use position ordinals (15584)
- Enable missing features in polars-time (15558)
- Handle quoted identifiers when registering CTEs in the SQL engine (15564)
- Decompress moved out of schema initialization (15550)
- Turn off cse if cache node found (15554)

๐Ÿ“– Documentation

- Add legacy CPU install instructions in user guide (13676)
- Examples for errors (13724)
- Add docstring examples for reading json (14481)
- Add security warning in LazyFrame.deserialize() docstring (15282)
- Various minor updates to User Guide's SQL intro section (15557)

๐Ÿ› ๏ธ Other improvements

- Replace most deprecated calls with bounded version (15632)
- use bound api (15630)
- Initial PyO3 0.21 support (15622)
- Don't run streaming group-by in partitionable gb (15611)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- Set up CodSpeed (15537)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, Fokko, JamesCE2001, MarcoGorelli, NedJWestern, TrevorWinstral, alexander-beedie, deanm0000, douglas-raillard-arm, eitsupi, filabrazilska, i-aki-y, itamarst, leoforney, mcrumiller, nameexhaustion, orlp, ozgrakkurt, reswqa, ritchie46 and stinodego


py-0.20.19
๐Ÿš€ Performance improvements

- Replace std::thread spawn with tokio block\_in\_place (15517)
- speed up offset\_by when a single offset is passed (15493)
- Avoid allocation in the hot path for struct JSON serialization (15449)

โœจ Enhancements

- Support list group-by of non numeric lists (15540)
- add business\_day\_count function (15512)
- Add SQL support for `MEDIAN` aggfunc (15519)
- Implement `string`, `boolean` and `binary` dtype in `top_k` (15488)
- Add SQL support for `TRUNCATE TABLE` command (15513)
- Add SQL support for `GREATEST` and `LEAST` (15511)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Implements `agg_list` for `NullChunked` (15439)

๐Ÿž Bug fixes

- dot product of two integer series is cast to float (15502)
- Resolve function names and prune all aliases. (15522)
- Pass `skip_rows_after_header` to pyarrow csv reader (15533)
- No longer error when `schema_overrides` contains nonexistent columns (15528)
- `list.get` should take validity into account (15516)
- block decimal in streaming (15520)
- `group_by` partitioned with literal `Series` panic (15487)
- Initialize validity for `GroupsProxy::Slice` windows (15509)
- Fix struct name resolving (15507)
- `pow` return type evaluation (15506)
- Address issue with `read_database` draining iter\_batches early (15504)
- Allow selectors inside frame-level `.filter()` (15445)
- Don't prune alias in AnonymousFunction subtree (15453)
- Raise if pass a negative `n` into `clear` (15432)
- Fix deadlock in async parquet scan (15440)

๐Ÿ“– Documentation

- Update leftover references of `by` parameter to `group_by` in `DataFrame/LazyFrame.upsample/group_by_dynamic/rolling` (15527)
- Add `make docs` command, DataType docs/layout tweak, minor README updates (15386)
- Add example for `Series.list.median`. (15451)

๐Ÿ› ๏ธ Other improvements

- Remove unused code paths in `read_parquet` (15532)
- Organize utils for I/O functionality (15529)
- Remove private `DataFrame._read` classmethods (15521)
- Move dedicated inference code out of `io.database` executor module (15526)
- Add unstable warning to `hive_schema` functionality (15508)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, MarcoGorelli, alexander-beedie, cmdlineluser, dependabot, dependabot[bot], henryharbeck, mbuhidar, nameexhaustion, reswqa, ritchie46, rob-sil and stinodego


py-0.20.18
๐Ÿš€ Performance improvements

- CSV reading memory usage tests and fixes (15422)
- avoid double-allocation in rolling\_apply\_agg\_window (15423)
- Make LogicalPlan immutable (15416)
- Add non-order preserving variable row-encoding (15414)
- Use row-encoding for multiple key group by (15392)

โœจ Enhancements

- Supports `explode_by_offsets` for decimal (15417)
- Add `read_clipboard` and `DataFrame.write_clipboard` (15272)
- Add `null_on_oob` parameter to `expr.list.get` (15395)
- make Series.\_\_bool\_\_ error message Rusttier (15407)
- CSV-writer escape carriage return (15399)

๐Ÿž Bug fixes

- datetime operations (e.g. .dt.year) were raising when null values were backed by out-of-range integers (15420)
- Ensure Binary -> Binview cast doesn't overflow the buffer size (15408)
- Don't prune alias in function subtree (15406)
- Return 0 for `n_unique()` in group-by context when group is empty (15289)
- Unset UpdateGroups after group-sensitive expression (15400)
- `to_any_value` should supports all LiteralValue type (15387)
- Hash failure combining hash of two numeric columns containing equal values (15397)
- Add FixedSizeBinary to arrow field conversion (15389)
- Conversion of expr\_ir in partition fast path (15388)
- fix panic when doing a scan\_parquet with hive partioning (15381)
- `sort` for series with unsupported dtype should raise instead of panic (15385)

๐Ÿ“– Documentation

- Added example for `explode` mapping strategy in `pl.Expr.over` (15402)
- Add `outer_coalesce` join strategy in the user guide (15405)
- Change the example to series for `series/array.py` (15383)
- Add "See Also" for `arg_sort` and `arg_sort_by` (15348)

๐Ÿ› ๏ธ Other improvements

- make dsl immutable and cheap to clone (15394)
- use recursive crate, add missing recursive tag (15393)
- Update CODEOWNERS (polars-sql) (15384)

Thank you to all our contributors for making this release possible!
CanglongCl, JamesCE2001, MarcoGorelli, Sol-Hee, alexander-beedie, dependabot, dependabot[bot], itamarst, kszlim, mcrumiller, nameexhaustion, orlp, reswqa, ritchie46, rob-sil and thomaslin2020


py-0.20.17
๐Ÿ† Highlights

- Full plan CSE (15264)

โš ๏ธ Deprecations

- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)
- Rename `from_repr` parameter from `tbl` to `data` (15156)

๐Ÿš€ Performance improvements

- load bits one word at a time for BitmapIter (15333)
- Ipc exec multiple paths (15040)
- add SIMD support for if-then-else kernels (15131)

โœจ Enhancements

- Remove 'FileCacher' optimization (15357)
- check input type in entropy (15351)
- Implements `arr.n_unique` (15296)
- CSE don't scan share if predicate pushdown predicates don't match (15328)
- Add `read_database` support for `SurrealDB` ("ws" and "http") (15269)
- Only allow inputs of type `Sequence` in `from_records` (15329)
- In hypothesis testing strategies, enable Decimal strategy by default (15321)
- Remove cached nodes when finished (15310)
- Full plan CSE (15264)
- More robust handling of `async` database calls (15202)
- Add `name` parameter to `GroupBy.len` method (15235)
- Add IR for expressions. (15168)
- Improve `read_database` when reading from Kรนzu graph database (15218)
- Warn if `map_elements` is called without `return_dtype` specified (15188)
- Add support for `async` SQLAlchemy connections to `read_database` (15162)
- Infer `time_unit` in `pl.duration` when nanoseconds is specified (14987)
- Add `strict` parameter to `from_dict/from_records` (15158)

๐Ÿž Bug fixes

- Return correct dtype for `s.clear()` when dtype is `Object` (15315)
- ensure first datapoint is always included in group\_by\_dynamic (15312)
- Non-exhaustive patterns: arrow-schema::DataType in polars-arrow (15250)
- use dynamic stacks for problematic recursive functions (15355)
- Adding default ddof for `Series.list.std` and `Series.list.var` (15267)
- Raise properly for slices not supported by `LazyFrame` (15331)
- Propagate strictness in `from_dicts` (15344)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Enforce integer `dtype` input for `int_range` and `int_ranges` (15339)
- Preserve Decimal precision when constructing empty Series (15320)
- Fix cache dot visualization (15311)
- Handle special case correctly when slicing a `LazyFrame` (15297)
- Properly propagate `strict` flag when constructing a Struct Series from any values (15302)
- Consistent expansion of nested struct data during `DataFrame` init from dict (15217)
- Raise when join projects name with suffix that doesn't exist (15256)
- Ensure ooc\_start is set (15255)
- Fix bug where rolling operations were ignoring `check_sorted` in some cases (15227)
- Fix lazy schema for `rle` expression (15248)
- incorrect negative offset in multi-byte string slicing (15140)
- do not clamp negative offsets to start of array prematurely (15242)
- allow null index in list.get and array.get (15239)
- Avoid loading all columns in `read_parquet` when `columns` parameter is specified (15229)
- properly support nulls\_last + descending (15212)
- fix nested runtime panic (15216)
- Block rounding/truncating to negative durations (15175)
- Ensure the `cs.temporal()` selector uses wildcard time zone matching for `Datetime` (13683)
- Consistently raise `TypeError` on constructor failure (15178)
- Properly propagate strictness in some constructor cases (15166)
- Fix constructing a Series from a list of Series with given dtype (15144)

๐Ÿ“– Documentation

- Fix time unit in `timestamp` example (15281)
- Fix link to renamed method (.list.lengths -> .list.len) (15228)
- Update Excel and database pages in user guide (14721)
- Add examples for `Series.search_sorted` (14737)
- Correct the definition of an expression in the user guide (14750)
- Add a note about the behaviour of lower/upper bounds for `is_between`, and add an example (15197)

๐Ÿ“ฆ Build system

- Update Cargo lock (15370)

๐Ÿ› ๏ธ Other improvements

- Memory usage test infrastructure, plus a test for 15098 (15285)
- Update CODEOWNERS (15352)
- remove try\_apply\_values (15336)
- always use non-legacy float\_sum for mean (15343)
- remove legacy bitmap module (15335)
- Fix test not writing to temporary directory (15318)
- Reorganize tests for `clear` operation (15304)
- Rename `Cache[count]` to `Cache[cache_hits]` (15300)
- Cleanup file\_caching optimization call (15299)
- Minor refactor of `PyDataFrame.from_dicts` (15274)
- remove 'fast-projection' node (15253)
- cleanup column names in optimizer (15252)
- remove left\_most\_input\_name from expr ir (15251)
- add AlignedBitmapSlice (15171)
- Run non-benchmark tests in benchmark workflow (15207)
- Add `wrapping_abs` to arithmetic kernel (15210)
- remove raw buffers from BinViewArray (15206)
- Enable `RUST_BACKTRACE=1` in the CI test suite (15204)
- Split `read_database` functionality into cleaner module structure (15201)
- Clean up some of the AnyValue conversion logic (15190)
- remove parts of legacy bit\_util (15169)
- remove legacy arrow compute (15164)
- Split up `dataframe` module in PyO3 bindings (15165)
- Remove unused private constructors (15160)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, braaannigan, c-peters, cojmeister, deanm0000, dependabot, dependabot[bot], itamarst, kszlim, mbuhidar, mcrumiller, mickvangelderen, orlp, petrosbar, reswqa, ritchie46, rob-sil, sportfloh, stinodego and yutannihilation


rs-0.38.3
๐Ÿš€ Performance improvements

- add new when-then-otherwise kernels (15089)
- Coerce sorted flag of unit arrays during concat (15104)
- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- raise if both `closed` and `by` are passed to `rolling_*` aggregations (15108)
- raise informative error for rolling\_\* aggs with `by` of invalid dtype (15088)
- add `non_existent` arg to `replace_time_zone` (15062)
- Support single nested row encodings (15105)
- make ooc sort configurable (15084)
- Make `register_plugin` a standalone function and include shared lib discovery (14804)
- Async parquet: Decode parquet on a blocking thread pool (15083)
- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Return error when no supertype can be determined in AnyValue constructor when `strict=false` (15025)
- Implement IpcReaderAsync (14984)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Fix Series construction from nested list with mixed data types (15046)
- Support BinaryView in row decoder to prevent a panic in streaming group by (15117)
- Binview chunked gather; don't modify inlined view (15124)
- Fix chunked\_id gather for binview buffers (15123)
- Don't cache HTTP object stores as they maintain URL state (15121)
- use wrapping\_add in csv line snooping (15109)
- Output `u32` when `sum_horizontal` provided with single boolean column (15114)
- Ensure `eprintln!` is only called within debug/verbose context (15100)
- Propagate error instead of panicking when calling `product` on an invalid type (15093)
- Raise error when casting Array to different width (14995)
- Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (15065)
- Incorrectly preserved sorted flag when concatenating sorted series containing nulls (15082)
- Return largest non-NaN value for `max()` on sorted float arrays if it exists instead of NaN (15060)
- return NaN for all-NaN min/max (15066)
- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Strict cast in when/then/otherwise operation (15052)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Fix typo in comment (14997)

๐Ÿ› ๏ธ Other improvements

- Extend and speed up scan tests (15127)
- always assert on ChunkedArray::get (15120)
- Use ObjectStore instead of AsyncRead in parquet get metadata (15069)
- Minor refactor of Rust any value constructors (15077)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- Apply `clippy:assigning_clones` lint (14999)
- fix features (14977)

Thank you to all our contributors for making this release possible!
JackRolfe, MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46, stinodego and trueb2


py-0.20.16
๐Ÿš€ Performance improvements

- add new when-then-otherwise kernels (15089)
- Coerce sorted flag of unit arrays during concat (15104)
- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- improved dtype inference/refinement for `read_database` results (15126)
- raise if both `closed` and `by` are passed to `rolling_*` aggregations (15108)
- raise informative error for rolling\_\* aggs with `by` of invalid dtype (15088)
- add `non_existent` arg to `replace_time_zone` (15062)
- Support single nested row encodings (15105)
- make ooc sort configurable (15084)
- Make `register_plugin` a standalone function and include shared lib discovery (14804)
- Expose `infer_schema_length` parameter on `read_database` (15076)
- Async parquet: Decode parquet on a blocking thread pool (15083)
- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Add `strict` parameter to `DataFrame` constructor to allow non-strict construction (15034)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Support BinaryView in row decoder to prevent a panic in streaming group by (15117)
- Binview chunked gather; don't modify inlined view (15124)
- Fix chunked\_id gather for binview buffers (15123)
- Don't cache HTTP object stores as they maintain URL state (15121)
- Output `u32` when `sum_horizontal` provided with single boolean column (15114)
- Propagate error instead of panicking when calling `product` on an invalid type (15093)
- Raise error when casting Array to different width (14995)
- Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (15065)
- Incorrectly preserved sorted flag when concatenating sorted series containing nulls (15082)
- Return largest non-NaN value for `max()` on sorted float arrays if it exists instead of NaN (15060)
- return NaN for all-NaN min/max (15066)
- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Update write\_database code blocks in user guide (15106)
- Add missing docstring examples in the Struct namespace (15071)
- Improve API reference landing page (14888)
- improve join\_asof example (14993)
- Fix inadvertent swap of `new` and `old` parameters in `replace` description (15019)

๐Ÿ› ๏ธ Other improvements

- Extend and speed up scan tests (15127)
- Add parameterized-scan-tests (15057)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- fix features (14977)
- Revert pinning PyPI publish action (14975)

Thank you to all our contributors for making this release possible!
JackRolfe, MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46, stinodego and trueb2


py-0.20.16-rc.1
๐Ÿš€ Performance improvements

- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Add `strict` parameter to `DataFrame` constructor to allow non-strict construction (15034)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Improve API reference landing page (14888)
- improve join\_asof example (14993)
- Fix inadvertent swap of `new` and `old` parameters in `replace` description (15019)

๐Ÿ› ๏ธ Other improvements

- Add parameterized-scan-tests (15057)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- fix features (14977)
- Revert pinning PyPI publish action (14975)

Thank you to all our contributors for making this release possible!
MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, petrosbar, ritchie46, stinodego and trueb2


rs-0.38.2
๐Ÿ† Highlights

- Streaming outer joins (14828)

๐Ÿš€ Performance improvements

- Ensure parallel encoding/compression in `sink_parquet` (14964)
- hoist errors out of iterators in parquet (14945)
- add basic AVX-512 filters (14892)
- improve join-asof materialization (14884)
- Optimize chunked-id gather for binaryviews (14878)
- rework scalar filter kernels (14865)
- Reduce size of optional join-indexes (14856)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)

โœจ Enhancements

- Support writing `Array` type in parquet (14943)
- Sort decimal fields (14649)
- Import `NamedFrom` in `df!` macro (14860)
- try-improve concurrency tuner (14827)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)
- Ensure binview types are rle-encoded in parquet write (14818)
- Implement strict/nonstrict conversion for primitive AnyValues (14186)
- Disable timeouts (14809)
- cleanup spill disks in process (14807)

๐Ÿž Bug fixes

- Fix invalid partitionable query (14966)
- allow nonstrict cast of categorical/enum to enum (14910)
- `count_rows` multi-threaded under-counting in parser.rs (14963)
- raise proper error instead of panicking when result of truncation is non-existent datetime (14958)
- ooc-sort issues (14959)
- Do not raise when constructing from a list of Series with Nones (14942)
- Don't access out-of-bounds for null indices in bitmap gather (14932)
- std when ddof>=n\_values returns None even in rolling context (11750)
- Don't rechunk categoricals when moving to physical (14934)
- parquet rle boolean decoder (14931)
- boolean filter gave overly large buffers to Bitmap::from\_u8\_vec (14924)
- Fix sliced dictionary state in parquet (14917)
- Fix possibly incorrect order of columns when using ipc stream `with_columns` (14859)
- Fully qualify `polars_bail!` in `polars_ensure!` (14901)
- Fix `DataFrame.min`/`max` for decimals (14890)
- Assert chunks are equal after physical cast to prevent OOB (14873)
- not all cpu feature flag tests were mocked (14864)

๐Ÿ“– Documentation

- Remove some repetition in comments/docstrings (14912)
- Update contributing link (14882)
- Fix some word-repetition in code comments (14825)
- Seperate `asof` from join strategy, change parameter from `strategy` to `how` in user guide (14793)

๐Ÿ› ๏ธ Other improvements

- fix features (14977)
- fix chrono deprecation warnings (14928)
- Update Cargo.lock and remove cmake limit workaround (14905)
- Simplify streaming placeholder replacement. (14915)
- Optional deps should include `fastexcel` (14907)
- Deduplicate `POLARS_FORCE_ASYNC` env var parsing (14909)
- Make assumption about column name to index conversion having occurred explicit (14894)
- Make assumption about wildcards having been resolved explicit (14899)
- reactivate argminmax simd (14679)
- sort by 'idx' after outer join (14867)
- Simplify computation of `with_columns` attribute in physical csv scanner of default engine. (14837)
- centrally define IdxSize (14854)
- run and fix pext64\_polyfill test (14852)
- introduce partitioned table (14819)
- add missing deprecation directive in groupby.count (14817)
- Extract key value construction (14812)
- Fix Makefile build commands (14806)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Sol-Hee, alexander-beedie, ambidextrous, battmdpkq, deanm0000, dependabot, dependabot[bot], eitsupi, flisky, geekvest, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


py-0.20.15
๐Ÿš€ Performance improvements

- Ensure parallel encoding/compression in `sink_parquet` (14964)
- hoist errors out of iterators in parquet (14945)
- add basic AVX-512 filters (14892)

โœจ Enhancements

- Support writing `Array` type in parquet (14943)
- Add `drop_first` parameter to `Series.to_dummies` (14846)
- Add "execute\_options" support for `read_database_uri` (14682)

๐Ÿž Bug fixes

- Fix invalid paritionable query (14966)
- allow nonstrict cast of categorical/enum to enum (14910)
- `count_rows` multi-threaded under-counting in parser.rs (14963)
- raise proper error instead of panicking when result of truncation is non-existent datetime (14958)
- ooc-sort issues (14959)
- Do not raise when constructing from a list of Series with Nones (14942)
- Don't access out-of-bounds for null indices in bitmap gather (14932)
- std when ddof>=n\_values returns None even in rolling context (11750)
- Don't rechunk categoricals when moving to physical (14934)
- Ensure consistent `read_database` behaviour with empty ODBC "iter\_batches" (14918)
- parquet rle boolean decoder (14931)
- Fix frame init from single `RecordBatch` objects when `pyarrow <= 12` (14922)
- boolean filter gave overly large buffers to Bitmap::from\_u8\_vec (14924)
- Fix sliced dictionary state in parquet (14917)
- `read_database` now properly handles empty result sets from `arrow-odbc` (14916)
- Fix possibly incorrect order of columns when using ipc stream `with_columns` (14859)

๐Ÿ“– Documentation

- Add note about `include_index` in `from_pandas` regarding "default indices" (14920)
- Remove some repetition in comments/docstrings (14912)

๐Ÿ› ๏ธ Other improvements

- Update Cargo.lock and remove cmake limit workaround (14905)
- Simplify streaming placeholder replacement. (14915)
- Optional deps should include `fastexcel` (14907)
- Deduplicate `POLARS_FORCE_ASYNC` env var parsing (14909)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ambidextrous, battmdpkq, mcrumiller, mickvangelderen, orlp, petrosbar and ritchie46


py-0.20.14
๐Ÿ† Highlights

- Streaming outer joins (14828)

โš ๏ธ Deprecations

- Deprecate `overwrite_schema` parameter for `DataFrame.write_delta` (14879)

๐Ÿš€ Performance improvements

- improve join-asof materialization (14884)
- Optimize chunked-id gather for binaryviews (14878)
- rework scalar filter kernels (14865)
- Reduce size of optional join-indexes (14856)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)

โœจ Enhancements

- Sort decimal fields (14649)
- Revert addition of `__slots__` to Polars classes (14857)
- Add `fastexcel` to `show_versions` (14869)
- try-improve concurrency tuner (14827)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)
- support use of KรนzuDB via `pl.read_database` (14822)
- Ensure binview types are rle-encoded in parquet write (14818)
- Disable timeouts (14809)
- cleanup spill disks in process (14807)
- Implement compression and skipping for binview IPC (14789)

๐Ÿž Bug fixes

- Fix `DataFrame.min`/`max` for decimals (14890)
- Assert chunks are equal after physical cast to prevent OOB (14873)
- not all cpu feature flag tests were mocked (14864)
- Remove custom `__reduce__` implementation on `DataType` object (14778)
- Allow non-strict construction / initialization of Enum columns (14728)
- Fix streaming parquet limit (14783)

๐Ÿ“– Documentation

- Update contributing link (14882)
- update to use `ambiguous` instead of `use_earliest` (14820)
- Seperate `asof` from join strategy, change parameter from `strategy` to `how` in user guide (14793)

๐Ÿ› ๏ธ Other improvements

- Pin PyPI publish action to commit (14896)
- reactivate argminmax simd (14679)
- sort by 'idx' after outer join (14867)
- run and fix pext64\_polyfill test (14852)
- add missing deprecation directive in groupby.count (14817)
- Fix Makefile build commands (14806)
- Bump ruff from 0.2.0 to 0.3.0 in /py-polars (14800)
- Rename `utils` module to `_utils` to explicitly mark it as private (14772)
- Add test coverage for `_cpu_check` module (14768)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Sol-Hee, alexander-beedie, c-peters, deanm0000, dependabot, dependabot[bot], eitsupi, flisky, geekvest, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


rs-0.38.1
๐Ÿš€ Performance improvements

- Elide utf8/binary cast in Parquet reading (14757)

โœจ Enhancements

- Implement compression and skipping for binview IPC (14789)

๐Ÿž Bug fixes

- fix feature flags (14802)
- Allow non-strict construction / initialization of Enum columns (14728)
- Fix streaming parquet limit (14783)

๐Ÿ“ฆ Build system

- bump rayon from 1.8.1 to 1.9.0 (14797)

Thank you to all our contributors for making this release possible!
alexander-beedie, c-peters, dependabot, dependabot[bot], ritchie46 and stinodego


rs-0.38.0
๐Ÿ† Highlights

- fast path for COUNT(\*) queries (14574)
- Implemented tree formatting for LogicalPlan (14221)

๐Ÿ’ฅ Breaking changes

- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- Mark `DataFrame::new_no_checks` and `DataFrame::new_no_length_checks` unsafe (14443)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)

๐Ÿš€ Performance improvements

- auto-tune concurrency budget (14753)
- Don't materialize for broadcasting `fill_null` value and default value of `replace` (14736)
- Improve performance of boolean filters `1-100x`. (14746)
- fix accidental quadratic utf8 validation in parquet (14705)
- fast path for COUNT(\*) queries (14574)
- Elide the total order wrapper for non-(float/option) types (14648)
- add utf8-validation fast paths for utf8view (14644)
- don't reassign chunks back to df owner (14633)
- If there are many small chunks in write\_parquet(), convert to a single chunk (14484) (14487)
- Polars thread pool was not used properly in various functions (14583)
- use owned arithmetic in horizontal\_sum (14525)
- Combine small chunks in sinks for streaming pipelines (14346)
- reduce heap allocs in expression/logical-plan iteration (14440)
- simplify and speed up cum\_sum and cum\_prod (14409)
- simplify negated predicates to improve row groups skipping (14370)
- prune parquet row groups when `is_not_null` is used (14260)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)

โœจ Enhancements

- Change default for maximum number of Series items printed to 10 to match DataFrame (14703)
- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- fast path for COUNT(\*) queries (14574)
- let `rolling` accept `index_column` of type UInt32 or UInt64 (14669)
- Treat float -0.0 == 0.0 and -NaN == NaN in group-by, joins and unique (14617)
- Properly cache object-stores (14598)
- Mark `DataFrame::new_no_checks` and `DataFrame::new_no_length_checks` unsafe (14443)
- flatten aliases (14512)
- Make formatting more consistent in DOT graphs (14486)
- add `flush` operator to streaming operators (14500)
- Increase verbosity of duplicate column error message (11899)
- change print to warn in reading csv from python file like object (14469)
- Raise if `pivot` would introduce duplicate column names (14431)
- apply negate in simplify expression pass (14436)
- restrict more cloud interop to semaphore budget (14435)
- Implement `min`/`max` for categorical dtype (14112)
- add boolean rle decoding for parquet (14403)
- Allow brackets in SQL join conditions (14263)
- Improve panic message for missing struct feature in `DataType::from_arrow` (14392)
- Implement the `IntoLazy` trait for LazyFrame (14323)
- Implemented tree formatting for LogicalPlan (14221)
- Implement `mean_horizontal` expression (14369)
- support decimal comparison (14338)
- Implements `arr.shift` (14298)
- Implements `list.n_unique` (14306)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Polish decimal arithmetic (14172)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- Add strict/non-strict construction of Boolean/Binary series (14073)
- Improve `Series::from_any_values` logic (14052)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)

๐Ÿž Bug fixes

- fix hashing specialization (14754)
- Sum after filter in aggregation context sometimes returned NULL (14752)
- Allow `list.contains()` for list of categoricals (14744)
- Fix bug where alias was ignored in COUNT(\*) optimization (14738)
- Fix `DataFrame.sum` for decimals (14732)
- Fix parallel strategy for LazyFrame not being applied (14696)
- Block slice pushdown past non-literal projections or when the projection doesn't contain any columns from the input (14684)
- Fix number of rows printed in `DataFrame/Series` repr (edge cases) (14548)
- Fix contention panics in file gc threads (14690)
- Fix feature combination (14688)
- Only push predicates depending on the subset columns past `unique()` (14668)
- Reading RLE\_DICTIONARY-encoded parquet incorrectly coalesced NULL to empty string in some cases (14670)
- use correct flooring division/modulo operator in literal optimizer and const\_lhs \<> series ops (14671)
- Enable `is_in` for string in categorical/enum (14576)
- Polars thread pool was not used properly in various functions (14583)
- Semi-join and multiple keys outer-join did not respect POLARS\_MAX\_THREADS (14571)
- Correct sorted flag of chunked gather (14570)
- ensure the streaming dispatcher can replace placeholders in unions (14537)
- Ensure series are contiguous prior to `transpose` (14527)
- write csv header if necessary when finishing sinks (14518)
- fix logical dtypes in take\_chunked (14517)
- fix binary-offset row-encode (14514)
- race conditions in OOC writing (14510)
- don't gc after variadic buffers are written (14473)
- Increase verbosity of duplicate column error message (11899)
- Return appropriate data type for duration `mean` and `median` (14376)
- change print to warn in reading csv from python file like object (14469)
- regression in out-of-core group-by by new string-type (14464)
- DataFrame.pivot was returning incorrect results when multiple columns were passed to `index` and one of them was Struct (14438)
- remove literal `Series` from projection state (14437)
- pivot was producing incorrect results when (single) `index` was Struct (14308)
- Error on some invalid `clip` inputs (14416)
- Series.hist panicking on empty/all-null (14407)
- rechunk series when apply\_lambda (14406)
- don't make column from filenames, don't ignore directories with (.) (14317)
- Remove duplicated content in error messages (8107)
- Fix `set_operation` if the input is sliced and be broadcast (14303)
- Wrap `par_iter` in `list.to_struct` by `POOL.install` (14304)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- Preserve name when casting to Enum (14320)
- `list.get` does not work on list of decimals (14276)
- relax precision when up scaling (14270)
- Allow format object series with registry (14272)
- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Fix join validation for String types (14229)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- add missing conditional compile flag for `StringFunction::Find` (14129)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- correct field type schema inference (using read\_csv) (14042)
- Map `AnyValue::Null` to datatype `Null` (14045)
- Use int formatter for unsigned ints (14043)
- quick fix for multiple chunks binary reverse (14024)
- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)

๐Ÿ“– Documentation

- Link to plugins tutorial more prominently (14727)
- Separate "writing a plugin" from "registering an expression" in user guide, add some extra links, don't use deprecated \_register\_plugin (14621)
- Remove some outdated information in `polars` crate docs (14608)
- Fix code block path for group by example in getting started guide (14612)
- Add missing 'string' column in reading-writing Rust example to match Python example (14597)
- Fix typo of "Cartesian" product (14585)
- Mention in contributing guide that PR titles should start with an uppercase letter (14584)
- Fix markdown newline for rendering function description in VSCode (14567)
- Clarify doc summary of `upsample_stable` (13623)
- Clean up grammar and capitalization in `README.md` (14488)
- Fix typo in plugins section (14402)
- Add debugging section to contributing docs (10576)
- Fix some typos (14394)
- Realign file structure of user guide (14360)
- Rust examples for data structures in user guide (14339)
- Add deprecation period policy example for post-1.0.0 (14184)
- Fix capitalization of user guide references (14291)
- fix code block in user-guide/lazy/schemas (14228)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Fix bullet point formatting in CI contributing guide (14117)
- Remove outdated reference to horizontal concat feature (14105)
- Replace alternatives page with more objective comparison (13784)

๐Ÿ“ฆ Build system

- update ahash (14731)
- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Fix `json` feature for `polars-sql` crate (14501)
- Enable feature nightly with optional sql feature (14222)

๐Ÿ› ๏ธ Other improvements

- update ahash (14731)
- replace transmute with bytemuck cast (14747)
- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Refactor `AnyValue` casting logic (13140)
- update rustc (14678)
- redundant imports all crates (14662)
- remove redundant imports up to polars-io, polars-time, polars-ops (14658)
- remove redundant imports (up until polars-core) (14646)
- Simplify compressed\_chunk\_size calculation and leave comments to explain for rle encode (14634)
- Rename coverage file (14607)
- Format safety sections in Rust docstrings (14446)
- Refactor code coverage workflow (14563)
- Disable status from code coverage (14545)
- Add code coverage CI (14532)
- Format safety comments (14447)
- Bump release drafter to v6 (14429)
- Bump `setup-graphviz` action to v2 (14418)
- Update `make clean` command (14408)
- Minor refactor to satisfy clippy (14364)
- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Enable `clippy` lint to warn on debug macros (14178)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- More generic way to present an expression tree diagram (14020)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)

Thank you to all our contributors for making this release possible!
BGR360, CBell045, CaselIT, FBruzzesi, JulianCologne, Kylea650, MarcoGorelli, Migi, NedJWestern, Object905, Vincenthays, Wainberg, alexander-beedie, apcamargo, braaannigan, bsubei, c-peters, dannyfriar, deanm0000, dependabot, dependabot[bot], dpinol, eLVas, edavisau, eitsupi, engdoreis, flisky, grinya007, i-aki-y, ion-elgreco, itamarst, janosh, jdanford, kalekundert, lukemanley, mbuhidar, mcrumiller, nameexhaustion, orlp, petrosbar, r-brink, rben01, reswqa, rijkvp, ritchie46, stinodego, taki-mekhalfa and thomasfrederikhoeck


py-0.20.13
๐Ÿš€ Performance improvements

- Elide utf8/binary cast in Parquet reading (14757)

๐Ÿž Bug fixes

- Add missing "pclmulqdq" instruction to `_cpu_check` ("read\_cpu\_flags") (14758)

๐Ÿ› ๏ธ Other improvements

- Test release wheels on x86-64 (14761)

Thank you to all our contributors for making this release possible!
alexander-beedie, ritchie46 and stinodego


py-0.20.12
> [!WARNING]
> **This release was deleted from PyPI.** Please use the [0.20.13](https://github.com/pola-rs/polars/releases/tag/py-0.20.13) release instead.

๐Ÿš€ Performance improvements

- auto-tune concurrency budget (14753)
- Don't materialize for broadcasting `fill_null` value and default value of `replace` (14736)
- Improve performance of boolean filters `1-100x`. (14746)

๐Ÿž Bug fixes

- fix hashing specialization (14754)
- Sum after filter in aggregation context sometimes returned NULL (14752)
- Allow `list.contains()` for list of categoricals (14744)
- Fix bug where alias was ignored in COUNT(\*) optimization (14738)
- Fix `DataFrame.sum` for decimals (14732)

๐Ÿ“– Documentation

- Link to plugins tutorial more prominently (14727)

๐Ÿ“ฆ Build system

- update ahash (14731)

๐Ÿ› ๏ธ Other improvements

- update ahash (14731)
- Use `datetime_to_int` util for AnyValue conversion (14743)
- Refactor `utils/convert.py` module (14739)

Thank you to all our contributors for making this release possible!
MarcoGorelli, c-peters, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


py-0.20.11
> [!WARNING]
> **This release was deleted from PyPI.** Please use the [0.20.13](https://github.com/pola-rs/polars/releases/tag/py-0.20.13) release instead.

๐Ÿ† Highlights

- fast path for COUNT(\*) queries (14574)

โš ๏ธ Deprecations

- Deprecate passing `time_unit=None` to `Datetime` constructor (14708)
- Rename `Expr.meta.write_json/Expr.from_json` to `Expr.meta.serialize/Expr.deserialize` (14490)
- Deprecate default value for `ignore_nulls` for `ewm` methods (14663)
- Deprecate `DataFrame/LazyFrame.approx_n_unique` (14594)

๐Ÿš€ Performance improvements

- 2-3x speedup in creating literals/Series of type `Date` (14716)
- fix accidental quadratic utf8 validation in parquet (14705)
- Add `__slots__` to most Polars classes (13236)
- fast path for COUNT(\*) queries (14574)
- Elide the total order wrapper for non-(float/option) types (14648)
- add utf8-validation fast paths for utf8view (14644)
- don't reassign chunks back to df owner (14633)
- If there are many small chunks in write\_parquet(), convert to a single chunk (14484) (14487)
- Polars thread pool was not used properly in various functions (14583)

โœจ Enhancements

- Change default for maximum number of Series items printed to 10 to match DataFrame (14703)
- Change default number of rows printed in Notebooks for DataFrame/Series to 10 (14536)
- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- fast path for COUNT(\*) queries (14574)
- let `rolling` accept `index_column` of type UInt32 or UInt64 (14669)
- Treat float -0.0 == 0.0 and -NaN == NaN in group-by, joins and unique (14617)
- Improve consistency of `dtype` inference from Python types (14600)
- Properly cache object-stores (14598)

๐Ÿž Bug fixes

- Fix parallel strategy for LazyFrame not being applied (14696)
- Block slice pushdown past non-literal projections or when the projection doesn't contain any columns from the input (14684)
- Fix number of rows printed in `DataFrame/Series` repr (edge cases) (14548)
- Fix contention panics in file gc threads (14690)
- Fix feature combination (14688)
- Only push predicates depending on the subset columns past `unique()` (14668)
- Properly handle a single empty `RecordBatch` in `from_arrow` (14683)
- More accurate type hints for binary file-like inputs (14674)
- Reading RLE\_DICTIONARY-encoded parquet incorrectly coalesced NULL to empty string in some cases (14670)
- use correct flooring division/modulo operator in literal optimizer and const\_lhs \<> series ops (14671)
- Enable `is_in` for string in categorical/enum (14576)
- Fixes a `read_database` issue loading specific datetime types from SQL Server backends (14627)
- Polars thread pool was not used properly in various functions (14583)

๐Ÿ“– Documentation

- Improve some DataType docstrings (14719)
- Fix bad link due to boldness in `pl.count` (14691)
- Improve docstrings for `ewm_*` and `rolling_*` methods (14667)
- Improve examples for `Series.binary.encode` and `Series.binary.decode`. (14579)
- Add examples for `Series.kurtosis` (14681)
- Fix docstring for `LazyGroupBy.len` (14661)
- Separate "writing a plugin" from "registering an expression" in user guide, add some extra links, don't use deprecated \_register\_plugin (14621)
- Fix code block path for group by example in getting started guide (14612)
- Add missing 'string' column in reading-writing Rust example to match Python example (14597)

๐Ÿ“ฆ Build system

- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)

๐Ÿ› ๏ธ Other improvements

- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Fix `make requirements` when conda environment is active (14693)
- update rustc (14678)
- redundant imports all crates (14662)
- Avoid unnecessary cast in Series constructor (14650)
- Add test on selecting Enum columns (14628)
- Use `uv` for `make requirements` (14618)
- Rename coverage file (14607)
- Add a lint-only `Makefile` option (14602)
- No longer use `SeriesView` in `Series.to_numpy` (14588)

Thank you to all our contributors for making this release possible!
Kylea650, MarcoGorelli, Object905, alexander-beedie, bsubei, c-peters, eLVas, itamarst, mbuhidar, mcrumiller, nameexhaustion, orlp, rijkvp, ritchie46 and stinodego


py-0.20.10
โš ๏ธ Deprecations

- Add `allow_copy` parameter to `DataFrame.to_numpy` (14569)

๐Ÿš€ Performance improvements

- Avoid loading pandas in `from_arrow` when array has 0 chunks (14562)

โœจ Enhancements

- Warn on inefficient use of `map_elements` for additional string functions (14565)
- Add `allow_copy` parameter to `DataFrame.to_numpy` (14569)
- Improve `read_database` interop with sqlalchemy `Session` and various `Result` objects (14557)
- Warn on inefficient use of `map_elements` for temporal attributes/methods (14529)

๐Ÿž Bug fixes

- Semi-join and multiple keys outer-join did not respect POLARS\_MAX\_THREADS (14571)
- Correct sorted flag of chunked gather (14570)

๐Ÿ“– Documentation

- Fix typo of "Cartesian" product (14585)
- Mention in contributing guide that PR titles should start with an uppercase letter (14584)
- Fix markdown newline for rendering function description in VSCode (14567)

๐Ÿ› ๏ธ Other improvements

- Refactor code coverage workflow (14563)
- Disable status from code coverage (14545)

Thank you to all our contributors for making this release possible!
CBell045, alexander-beedie, c-peters, nameexhaustion, ritchie46 and stinodego


py-0.20.9
๐Ÿš€ Performance improvements

- use owned arithmetic in horizontal\_sum (14525)

โœจ Enhancements

- Add `writable` flag to `DataFrame.to_numpy` (14520)
- flatten aliases (14512)
- Make formatting more consistent in DOT graphs (14486)
- add `flush` operator to streaming operators (14500)

๐Ÿž Bug fixes

- ensure the streaming dispatcher can replace placeholders in unions (14537)
- Ensure series are contiguous prior to `transpose` (14527)
- write csv header if necessary when finishing sinks (14518)
- fix logical dtypes in take\_chunked (14517)
- fix binary-offset row-encode (14514)
- race conditions in OOC writing (14510)
- Remove `is_numeric` check on `Series.std/var` (14493)
- Error on invalid `schema` input in DataFrame constructor (14483)

๐Ÿ“– Documentation

- Fix docstring example for `Config.save_to_file` (14533)
- Fix `infer_schema_length` param description (14233)
- Clean up grammar and capitalization in `README.md` (14488)
- Add examples for `Series.bin.ends_with`, `Series.bin.starts_with`, `Series.bin.decode`, `Series.bin.encode`. (14478)

๐Ÿ› ๏ธ Other improvements

- add code coverage CI (14532)
- Re-enable streaming OOC tests (14522)
- Use constant for checking Sphinx building (14502)

Thank you to all our contributors for making this release possible!
FBruzzesi, NedJWestern, c-peters, dannyfriar, i-aki-y, jdanford, mbuhidar, mcrumiller, ritchie46, stinodego and taki-mekhalfa


py-0.20.8
๐Ÿ† Highlights

- Implemented tree formatting for LogicalPlan (14221)

โš ๏ธ Deprecations

- Deprecate positional args in `pivot` to prepare new functionality (14428)

๐Ÿš€ Performance improvements

- Combine small chunks in sinks for streaming pipelines (14346)
- reduce heap allocs in expression/logical-plan iteration (14440)
- simplify and speed up cum\_sum and cum\_prod (14409)
- simplify negated predicates to improve row groups skipping (14370)

โœจ Enhancements

- Increase verbosity of duplicate column error message (11899)
- change print to warn in reading csv from python file like object (14469)
- Raise if `pivot` would introduce duplicate column names (14431)
- apply negate in simplify expression pass (14436)
- restrict more cloud interop to semaphore budget (14435)
- Implement `min`/`max` for categorical dtype (14112)
- Hide `polars.testing.*` in pytest stack traces (14399)
- expose numpy view to integer types (14405)
- Allow column name input in `clip` (14410)
- add boolean rle decoding for parquet (14403)
- Allow brackets in SQL join conditions (14263)
- Implemented tree formatting for LogicalPlan (14221)
- Implement `mean_horizontal` expression (14369)
- support decimal comparison (14338)
- Implements `arr.shift` (14298)
- Implements `list.n_unique` (14306)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- unset WRITEABLE flag in zero-copy output (14283)
- Support `Categorical/Enum` in `Series.to_numpy` (14275)
- add parametric testing support for the `Array` dtype (14265)

๐Ÿž Bug fixes

- don't gc after variadic buffers are written (14473)
- Increase verbosity of duplicate column error message (11899)
- Return appropriate data type for duration `mean` and `median` (14376)
- change print to warn in reading csv from python file like object (14469)
- regression in out-of-core group-by by new string-type (14464)
- DataFrame.pivot was returning incorrect results when multiple columns were passed to `index` and one of them was Struct (14438)
- remove literal `Series` from projection state (14437)
- pivot was producing incorrect results when (single) `index` was Struct (14308)
- Error on some invalid `clip` inputs (14416)
- Series.hist panicking on empty/all-null (14407)
- rechunk series when apply\_lambda (14406)
- Raise if invalid strategy is passed to `map_elements` (14397)
- Require exact checking for Decimals in assertion utils (14357)
- fix ufunc for unlimited column args (14328)
- Handle chunked Series in `Series.to_numpy` (14341)
- Remove duplicated content in error messages (8107)
- Fix `set_operation` if the input is sliced and be broadcast (14303)
- Wrap `par_iter` in `list.to_struct` by `POOL.install` (14304)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- Preserve name when casting to Enum (14320)
- `list.get` does not work on list of decimals (14276)
- relax precision when up scaling (14270)
- Allow format object series with registry (14272)

๐Ÿ“– Documentation

- Update `read_database` docstring note about getting the connection URI string for sqlalchemy (14461)
- Fix typo in plugins section (14402)
- Add debugging section to contributing docs (10576)
- Define what a 'character' means in `slice` / `len_chars` (14395)
- Clarify behavior of `DataFrame.rows_by_key` (14149)
- Fix some typos (14394)
- Realign file structure of user guide (14360)
- Rust examples for data structures in user guide (14339)
- Add deprecation period policy example for post-1.0.0 (14184)
- Add example for `Series.bin.contains` (14297)
- Small clarifications in the contributing guide (14310)
- Fix capitalization of user guide references (14291)
- Fix explode docstring mentioning String types (14285)
- Update deltalake docstrings to new link (14282)

๐Ÿ› ๏ธ Other improvements

- Ignore unclosed file warnings for now (14467)
- Raise better error in import timings test (14441)
- Refactor `arg_min/max` test case (14439)
- Skip some OOC tests that fail randomly in the CI (14434)
- Bump release drafter to v6 (14429)
- Set specific temp dir for OOC tests (14420)
- Bump `setup-graphviz` action to v2 (14418)
- Minor test refactor (14404)
- Update `make clean` command (14408)
- Internal rename of `_or` to `or_` in PyO3 (same for `_xor/_and`) (14393)
- Minor refactor of `DataFrame.to_numpy` structured code (14348)
- Update `Series.to_numpy` to handle Decimal/Time types in Rust (14296)
- Add test for `Series.to_numpy` with timezones (14337)
- Bump ruff version to 0.2.0 (14294)
- Temporarily fix failing deltalake test (14288)
- remove dataframe consortium standard api entrypoint (14279)

Thank you to all our contributors for making this release possible!
BGR360, CaselIT, MarcoGorelli, Migi, NedJWestern, Vincenthays, alexander-beedie, deanm0000, dependabot, dependabot[bot], engdoreis, flisky, grinya007, itamarst, janosh, kalekundert, lukemanley, mbuhidar, mcrumiller, petrosbar, r-brink, rben01, reswqa, ritchie46, stinodego, taki-mekhalfa and thomasfrederikhoeck


py-0.20.7
โš ๏ธ Deprecations

- Rename `threadpool_size` to `thread_pool_size` (14236)

๐Ÿš€ Performance improvements

- prune parquet row groups when `is_not_null` is used (14260)
- Avoid unnecessary copies in `Series.to_numpy` for boolean/temporal types (14261)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)

โœจ Enhancements

- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- move `F-order` data in and out of numpy to polars zero copy (14259)
- read arrow-c-interface without requiring pyarrow (14254)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Change `Series.to_numpy` to return `f64` for `Int32/UInt32` Series with nulls instead of `f32` (14240)
- Polish decimal arithmetic (14172)
- improved `read_excel` format detection, and support for excel 97-2004 workbooks (14234)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- support pd.Index in from\_pandas and elsewhere (14087)
- Allow renaming expressions with keyword syntax in `group_by` (14071)
- raise more informative error message if someone lands on Expr.\_\_bool\_\_ (14067)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)

๐Ÿž Bug fixes

- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Make `Series.to_numpy` on booleans without nulls return `bool` type (14239)
- fix ufunc in agg (change \_\_ufunc\_array\_\_ so it uses `is_elementwise=True` parameter) (14135)
- Fix join validation for String types (14229)
- enable windows test coverage for `read_excel` "calamine" (fastexcel) engine (14171)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- multiple `read_excel` updates (14039)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- respect `Object` dtype designation (14072)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- Allow `Series.to_pandas` for categorical types (14028)
- correct field type schema inference (using read\_csv) (14042)
- Use int formatter for unsigned ints (14043)

๐Ÿ“– Documentation

- fix code block in user-guide/lazy/schemas (14228)
- Add visualization page to user guide (13052)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Document that Kleene logic is followed in `any_horizontal` and `all_horizontal` (14148)
- Fix description of `return_dtype` parameter for `map_elements` and `map_batches` (14114)
- Fix bullet point formatting in CI contributing guide (14117)
- Add documentation on replacement strings to `str.replace` and `str.replace_all` (13382)
- Replace alternatives page with more objective comparison (13784)
- Note that only one `name` operation is allowed per expression (14075)
- Improve deprecation message of `dtype_if_empty` param (14068)
- fix more docstring bullet points (14065)

๐Ÿ› ๏ธ Other improvements

- Reorganize NumPy interop tests (14257)
- additional dataframe test coverage (14243)
- Remove `*args` in `Series.to_numpy` (14248)
- Move metadata utils to `meta` module (14230)
- remove unused method DataFrame.\_from\_dicts (14212)
- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Issue a warning when running doctests on Python 3.11 or lower (14187)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- remove unused argument (14014)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Vincenthays, Wainberg, alexander-beedie, apcamargo, braaannigan, c-peters, deanm0000, dependabot, dependabot[bot], dpinol, edavisau, eitsupi, flisky, grinya007, ion-elgreco, itamarst, lukemanley, mcrumiller, orlp, r-brink, reswqa, ritchie46, stinodego and taki-mekhalfa


rs-0.37.0
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

๐Ÿ’ฅ Breaking changes

- Remove `DatetimeChunked::convert_time_zone` (14046)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)
- Rename `drop_columns` to `drop` (13754)
- Rename `pl.count()` to `pl.len()` (13719)
- Rename `row_count_name`/`row_count_offset` parameters in IO functions to `row_index_*` (13563)
- Rename `with_row_count` to `with_row_index` (13494)

๐Ÿš€ Performance improvements

- prune parquet row groups when `is_not_null` is used (14260)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)
- improve string/binary reverse performance (14016)
- optimize `DataFrame.describe` by presorting columns (13822)
- elide redundant bound checks. (13909)
- speedup boolean filter (13905)
- speedup binview filter (13902)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)
- directly embed data ptr in Buffer (13744)
- elide parallelism restriction on generic rolling expressions (13662)
- ensure time groups are parallelized (13660)
- do not eagerly compute bitcount (13562)
- optimise SQL engine string concat (13499)
- remove lifetime requirement from CategoricalChunkedBuilder (13319)

โœจ Enhancements

- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Polish decimal arithmetic (14172)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- Add strict/non-strict construction of Boolean/Binary series (14073)
- Improve `Series::from_any_values` logic (14052)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)
- DataFrame supports explode by array column (13958)
- improve binary formatting (13981)
- preserve Enum information when going to IPC (13943)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (13944)
- support cast decimal to utf8 (13829)
- add SQL support for `timestamp` precision modifier (13936)
- support negative indexing and expressions for `LEFT`, `RIGHT` and `SUBSTR` SQL string funcs (13888)
- Introduce `explode` for `ArrayNameSpace` (13923)
- raise better error message for .dt.time on Date column (13932)
- List set\_operations supports float (13920)
- Add `ignore_nulls` for `arr.join` (13919)
- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add `nulls_last` for `Series.sort` (13794)
- Impl `count_matches` for array namespace (13675)
- Add `nulls_last` for `list/array.sort` (13795)
- Rename `drop_columns` to `drop` (13754)
- convert fixed-offset timezones to respective Etc timezone from time zone database (13738)
- Expressify `str.slice` (13747)
- implement binview for polars-row (13736)
- implement binview for polars-json (13737)
- add architecture for polars-flavored IPC (13734)
- implement binview comparison kernels (13715)
- raise default frame/series repr height from 8 to 10 (13699)
- write parquet ColumnOrder (13672)
- Impl `contains` for ArrayNameSpace (13638)
- improve `rolling()` expression formatting (13657)
- Implement `is_between` in Rust (11945)
- Expressify `pattern` of `str.extract` (13607)
- Impl `join` for ArrayNameSpace (13586)
- add SQL engine support for string cast to `json` (13624)
- add SQL engine support for `EXTRACT` and `DATE_PART` (13603)
- add `BinaryView` to `parquet` writer/reader. (13489)
- add SQL engine support for `POSITION` and `STRPOS` (13585)
- `is_in` support for array dtype (13559)
- add new `str.find` expression, returning the index of a regex pattern or literal substring (13561)
- add SQL engine support for `LIKE` and `ILIKE` pattern matching (13522)
- improve hive partition pruning (13358) (13426)
- don't rechunk by default in lazy scans (13518)
- Add `cum_count` expression function (13478)
- add SQL engine support for `IF` control flow function (13491)
- add SQL engine support for `MOD` function (13502)
- return datetime for datetime mean \& median (13417)
- add SQL engine support for `CONCAT_WS` string function (13483)
- `BinaryView`/`Utf8View` IPC support (13464)
- Implement wasm Pool::scope (13476)
- add SQL engine support for `RIGHT` and `REVERSE` string functions (13461)
- implement `BinaryView` and `Utf8View` in `polars-arrow` (13243)
- add SQL engine support for variadic string `CONCAT` function (13428)
- add support for AND in SQL join-clause context (13242)
- Impl ordering ops for array namespace (13414)
- add SQL engine support for `REPLACE` string function (13431)
- add SQL engine support for `SIGN` function (13429)
- add SQL engine support for `IFNULL` function (13432)
- additional SQL support for `bytes`, `bit`, and `hex` literals (13389)

๐Ÿž Bug fixes

- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Fix join validation for String types (14229)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- add missing conditional compile flag for `StringFunction::Find` (14129)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- correct field type schema inference (using read\_csv) (14042)
- Map `AnyValue::Null` to datatype `Null` (14045)
- Use int formatter for unsigned ints (14043)
- quick fix for multiple chunks binary reverse (14024)
- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)
- allow get access to list of categoricals (14015)
- Fix casting from categorical to numeric (13957)
- read\_csv preserve whitespace and newlines (13934)
- append decimal with different scale (13977)
- Allow casting integer types to Enum (13955)
- `arg_min/max` on categoricals should respect ordering (13998)
- serialize decimal type (13997)
- check input type for `arr/list.contains` (13959)
- Allow dtype merge when inner dtype is enum (13938)
- recurse less in streaming shared sinks (13930)
- ensure order is preserved if streaming from different sources (13922)
- Fix `is_not_null` for Struct columns (13921)
- make 100 \* pl.col(pl.Boolean).mean() work (13725)
- allow extract of numeric from str AnyValue (13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (13808)
- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- don't return NaN as free memory fraction (13860)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- Correct err message of `check_map_output_len` (13854)
- allow list creation of decimals (13851)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- decompress the right number of rows when reading compressed CSVs (13721)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- When reading Parquet or Arrow, convert +00:00 timezone to UTC (13816)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- do not read data for zero-length compressed buffer (13791)
- Fix the non-null test of `transpose` (13783)
- Raise error instead of panic when joining on wildcard/nth (13742)
- `str.concat` correctly ignore single null value (13751)
- Selectors `by_name` and `by_dtype` should allow empty list as input (11024)
- Use `NonZeroUsize` for `batch_size` parameter in `write_csv/sink_csv/scan_ndjson` (13726)
- error instead of panicking in sql if empty function (13691)
- gather.get schema (13679)
- ensure we hit proper cache in nested `rolling` expressions (13666)
- Allow `av_buffer` cast numeric record to temporal type (13661)
- streaming cross join if swapped is hit (13656)
- Make sure rolling key is projected when process projection (13622)
- fix schema inference for json (13637)
- Empty series of AggregatedList should also have list dtype (13620)
- fallback to cast kernel if `inline_cast` AnyValue raise (13595)
- `LazyFrame::join()` no longer ignores 3 `JoinArgs` parameters (13570)
- fix reverse variable row decoding (13587)
- Fix `scatter` for null values (13578)
- Fix `cum_count` with regards to start value / null values (13535)
- Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (13548)
- Treat Python `None` as null value for `Object` dtype (13564)
- `Expr.replace` to single value did not replace NULLs (13551)
- `AnyValue::StructOwned` panic when hashing (13553)
- improve hive partition pruning (13358) (13426)
- fix projection pushdown for new outer join schema (13527)
- ensure size-hint of TrueIdxIter is correct (13508)
- correct 'outer\_coalesce' logic in case of duplicate names (13501)
- raise for out-of-range datetimes in to\_datetime/strptime (13403)
- Keep logical type when getting values from list (13456)
- Handle duplicate/ambiguous inputs for `replace` (13217)
- skip null/empty values if replace\_lit\_n\_char (13400)
- fix is\_in operator when comparing string with global categoricals (13412)
- use different generics for `shift_and_fill` parameters (13379)

๐Ÿ“– Documentation

- fix code block in user-guide/lazy/schemas (14228)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Fix bullet point formatting in CI contributing guide (14117)
- Remove outdated reference to horizontal concat feature (14105)
- Replace alternatives page with more objective comparison (13784)
- Improve structure of user guide (13951)
- Improve structure of user guide (13639)
- Introduce ecosystem page in user guide (13903)
- Mention deltalake write support in README (13890)
- Fix typo in deprecation message of `with_row_count` (13793)
- Fix incorrect "coming from pandas" syntax (13767)
- Improve streaming section of the user guide (13750)
- fix linking to feature flags in user guide (13644)
- Improve documentation on broadcasting (13394)
- Add note about toolchain issue under native Windows (13590)
- update SQL section of the README (13529)
- update polars-business > polars-xdt link (13509)

๐Ÿ“ฆ Build system

- Enable feature nightly with optional sql feature (14222)
- remove horizontal\_concat feature (13390)

๐Ÿ› ๏ธ Other improvements

- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Enable `clippy` lint to warn on debug macros (14178)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- More generic way to present an expression tree diagram (14020)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)
- make Enums an actual datatype (14011)
- update rustc (13947)
- move `filter` to `polars-compute` (13897)
- bump object\_store to 0.9 (13857)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)
- Make `pl.duration` non-anonymous (13762)
- Rename `pl.count()` to `pl.len()` (13719)
- Deprecate `dt.with_time_unit` in favor of `cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))` (13667)
- Auto-add 'needs triage' label to bugs (13671)
- make rolling index column visible to optimizer (13658)
- Rename `lazy-regex` feature to `regex` to align `polars` with `polars-lazy` crate (13647)
- Add `Documentation` / `Build system` sections to the changelog (13594)
- Filter unhelpful messages in `make build` (13579)
- Remove extra line break between checkboxes in GitHub bug report issues (13576)
- Rename `row_count_name`/`row_count_offset` parameters in IO functions to `row_index_*` (13563)
- Rename `with_row_count` to `with_row_index` (13494)
- simplify parquet binary ordering function (13488)
- dont panic of ambiguous is of wrong type (13388)

Thank you to all our contributors for making this release possible!
29antonioac, Bromeon, ByteNybbler, JulianCologne, MarcNuebel, MarcoGorelli, NedJWestern, ShivMunagala, Vincenthays, Wainberg, aaarrti, alexander-beedie, apcamargo, bchalk101, braaannigan, c-peters, cgevans, cmdlineluser, collinprince, deanm0000, dependabot, dependabot[bot], dpinol, edavisau, eitsupi, flisky, grinya007, hamishs, henryharbeck, ion-elgreco, itamarst, jacksonthall22, jcrozum, kstoneriv3, langestefan, lukemanley, mcrumiller, mkucijan, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, s-banach, shritesh, stinodego, taki-mekhalfa, thomasaarholt, tim-stephenson, universalmind303, valorien and wjandrea


py-0.20.6
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

โš ๏ธ Deprecations

- Deprecate `dtype_if_empty` parameter for `Series` constructor (13976)

๐Ÿš€ Performance improvements

- improve string/binary reverse performance (14016)
- add "calamine" support to `read_excel`, using `fastexcel` (~8-10x speedup) (14000)
- optimize `DataFrame.describe` by presorting columns (13822)
- elide redundant bound checks. (13909)
- speedup boolean filter (13905)
- speedup binview filter (13902)
- allow python threads in read\_ functions (13886)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)

โœจ Enhancements

- Add `UnstableWarning` for unstable functionality (13948)
- DataFrame supports explode by array column (13958)
- add "calamine" support to `read_excel`, using `fastexcel` (~8-10x speedup) (14000)
- improve binary formatting (13981)
- preserve Enum information when going to IPC (13943)
- support calling `describe` on a `LazyFrame` (13982)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (13944)
- support cast decimal to utf8 (13829)
- add SQL support for `timestamp` precision modifier (13936)
- support negative indexing and expressions for `LEFT`, `RIGHT` and `SUBSTR` SQL string funcs (13888)
- Introduce `explode` for `ArrayNameSpace` (13923)
- unify Series/DataFrame `describe` code (13720)
- raise better error message for .dt.time on Date column (13932)
- List set\_operations supports float (13920)
- Add `ignore_nulls` for `arr.join` (13919)
- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- Align `int_range` and `int_ranges` signatures (13867)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- allow df.rename and lf.rename to take a renaming function (13708)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add typing to hvplot plot namespace (13813)
- Add `nulls_last` for `Series.sort` (13794)
- allow `ftp` URLs, improve URL check (13781)

๐Ÿž Bug fixes

- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)
- Make `to_pandas()` work for Dataframe and Series with dtype `Object` (13910)
- raise for `pl.concat(how="align")` when no columns are shared between frames (13941)
- Fix casting from categorical to numeric (13957)
- read\_csv preserve whitespace and newlines (13934)
- omit implicit 'site' from import-timing test (14009)
- append decimal with different scale (13977)
- Use `date_as_object=False` as default for `Series.to_pandas` (just like `DataFrame.to_pandas`) (13984)
- serialize decimal type (13997)
- check input type for `arr/list.contains` (13959)
- Fix `max_colname_length` formatting in `glimpse()` (13969)
- Allow dtype merge when inner dtype is enum (13938)
- recurse less in streaming shared sinks (13930)
- ensure order is preserved if streaming from different sources (13922)
- Fix `is_not_null` for Struct columns (13921)
- convert object-dtyped NumPy str/bytes arrays to pl.String/pl.Binary instead of pl.Object (13712)
- allow extract of numeric from str AnyValue (13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (13808)
- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- Fix interchange protocol for new String type (13881)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- allow list creation of decimals (13851)
- ensure kwargs `filter` behaviour matches docstring (expect equivalence with `eq`) (13864)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- validate operator arithmetic with `None`, fix `Series` edge-case (13780)

๐Ÿ“– Documentation

- Add missing doc entries (14006)
- add missing len to rst file (13999)
- Improve structure of user guide (13951)
- Improve structure of user guide (13639)
- Introduce ecosystem page in user guide (13903)
- Mention deltalake write support in README (13890)
- use proper argument names in the code blocks of api.rst (13866)

๐Ÿ› ๏ธ Other improvements

- make Enums an actual datatype (14011)
- omit implicit 'site' from import-timing test (14009)
- Constructor improvements - part 1 (14001)
- Add `glimpse` test (13979)
- Move PyO3 ChunkedArray conversion logic into its own module (13973)
- Fix xdist streaming group (13974)
- Fix spurious test failures (13961)
- minor `describe` tidy-up, and slight rewording of some Exception docstrings (13942)
- Fix pip warning filter return code (13935)
- Minor refactor of PyO3 conversions module (13929)
- move `filter` to `polars-compute` (13897)
- Revert pandas warning filter (13893)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)

Thank you to all our contributors for making this release possible!
ByteNybbler, JulianCologne, MarcoGorelli, Wainberg, alexander-beedie, c-peters, dependabot, dependabot[bot], edavisau, flisky, ion-elgreco, itamarst, jacksonthall22, kstoneriv3, mcrumiller, mkucijan, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, stinodego, taki-mekhalfa, thomasaarholt and valorien


py-0.20.6-rc.1
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

๐Ÿš€ Performance improvements

- speedup boolean filter (13905)
- speedup binview filter (13902)
- allow python threads in read\_ functions (13886)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)

โœจ Enhancements

- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- Align `int_range` and `int_ranges` signatures (13867)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- allow df.rename and lf.rename to take a renaming function (13708)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add typing to hvplot plot namespace (13813)
- Add `nulls_last` for `Series.sort` (13794)
- allow `ftp` URLs, improve URL check (13781)

๐Ÿž Bug fixes

- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- Fix interchange protocol for new String type (13881)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- allow list creation of decimals (13851)
- ensure kwargs `filter` behaviour matches docstring (expect equivalence with `eq`) (13864)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- validate operator arithmetic with `None`, fix `Series` edge-case (13780)

๐Ÿ“– Documentation

- Mention deltalake write support in README (13890)
- use proper argument names in the code blocks of api.rst (13866)

๐Ÿ› ๏ธ Other improvements

- move `filter` to `polars-compute` (13897)
- Revert pandas warning filter (13893)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)

Thank you to all our contributors for making this release possible!
ByteNybbler, MarcoGorelli, Wainberg, alexander-beedie, dependabot, dependabot[bot], edavisau, flisky, ion-elgreco, itamarst, kstoneriv3, mcrumiller, mkucijan, nameexhaustion, orlp, reswqa, ritchie46, stinodego, taki-mekhalfa and thomasaarholt


py-0.20.5
โš ๏ธ Deprecations

- Deprecate default delimiter value for `str.concat` (13690)
- Rename `pl.count()` to `pl.len()` (13719)
- Deprecate `dt.with_time_unit` in favor of `cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))` (13667)

๐Ÿš€ Performance improvements

- directly embed data ptr in Buffer (13744)

โœจ Enhancements

- Impl `count_matches` for array namespace (13675)
- Add `nulls_last` for `list/array.sort` (13795)
- convert fixed-offset timezones to respective Etc timezone from time zone database (13738)
- allow `read_excel` to load from remote http locations (13753)
- Expressify `str.slice` (13747)
- implement binview for polars-row (13736)
- implement binview for polars-json (13737)
- add architecture for polars-flavored IPC (13734)
- implement binview comparison kernels (13715)
- raise default frame/series repr height from 8 to 10 (13699)

๐Ÿž Bug fixes

- do not read data for zero-length compressed buffer (13791)
- Fix the non-null test of `transpose` (13783)
- Raise error instead of panic when joining on wildcard/nth (13742)
- `str.concat` correctly ignore single null value (13751)
- Selectors `by_name` and `by_dtype` should allow empty list as input (11024)
- Keep Series attributes docstrings when read by Sphinx (13731)
- fix error message when creating DataFrame from 0-dimensional NumPy array (13729)
- support corr() for single-column DataFrames (13728)
- Use `NonZeroUsize` for `batch_size` parameter in `write_csv/sink_csv/scan_ndjson` (13726)
- error instead of panicking in sql if empty function (13691)

๐Ÿ“– Documentation

- Fix typo in deprecation message of `with_row_count` (13793)
- Fix incorrect "coming from pandas" syntax (13767)
- Improve streaming section of the user guide (13750)
- improve `n_unique` and `approx_n_unique` docs (13752)
- add missing Series.str.find reference (13717)
- Be more explicit about behaviour in `str.strip_chars` / `strip_chars_start` / `strip_chars_end` docstrings (13697)
- Add doc example for `datetime_ranges` (13695)
- document %A and %B to get day name and month name (13678)

๐Ÿ› ๏ธ Other improvements

- Make `pl.duration` non-anonymous (13762)
- Add test for `describe` on Object types (13689)
- Only run bytecode parser CI workflow for Python 3.9/3.10 (13664)

Thank you to all our contributors for making this release possible!
29antonioac, MarcoGorelli, NedJWestern, Wainberg, alexander-beedie, cgevans, henryharbeck, langestefan, orlp, petrosbar, r-brink, reswqa, ritchie46, stinodego and universalmind303


py-0.20.23
๐Ÿš€ Performance improvements

- Don't rechunk in parallel collection (15907)
- Improve non-trivial list aggregations (15888)
- Ensure we hit specialized gather for binary/strings (15886)
- Limit the cache size for `to_datetime` (15826)
- skip initial null items and don't recompute `slope` in `interpolate` (15819)

โœจ Enhancements

- don't require pyarrow for converting pandas to Polars if all columns have simple numpy-backed datatypes (15933)
- Add option to disable globbing in csv (15930)
- Add option to disable globbing in parquet (15928)
- Expressify `dt.round` (15861)
- Improve error messages in context stack (15881)
- Add dynamic literals to ensure schema correctness (15832)
- Add a low-friction `sql` method for DataFrame and LazyFrame (15783)
- add timestamp time travel in delta scan/read (15813)

๐Ÿž Bug fixes

- Set default limit for String column display to 30 and fix edge cases (15934)
- Change recognition of numba ufunc (15916)
- series.search\_sorted could support more types of input (15940)
- Remove ffspec from parquet reader (15927)
- avoid WRITE+EXEC for CPUID check (15912)
- fix inconsistent decimal formatting (15457)
- Preserve NULLs for `is_not_nan` (15889)
- double projection check should only take the upstream projections into account (15901)
- Ensure we don't create invalid frames when combining unit lit + โ€ฆ (15903)
- Clear cached rename schema (15902)
- Fix OOB in struct lit/agg aggregation (15891)
- Refine interaction of "schema\_overrides" with `read_excel` when using "calamine" engine (15827)
- Don't modify user-supplied `storage_options` dict (take a shallow-copy) (15859)
- create (q)cut labels in fixed order (15843)
- Tag `shrink_dtype` as non-streaming (15828)

๐Ÿ“– Documentation

- improve graphviz install documentation/error message (15791)
- Extend docstring examples for asof\_join (15810)

๐Ÿ“ฆ Build system

- Don't import jemalloc (15942)
- Use default allocator for lts-cpu (15941)
- replace all macos-latest referrals with macos-13 (15926)
- pin mimalloc and macos-13 (15925)
- use jemalloc in lts-cpu (15913)
- Update Cargo.lock (15865)
- Bump `ruff` version and improve `make clean` on the Python side (15858)
- Exclude `rust-toolchain.toml` from wheels (15840)

๐Ÿ› ๏ธ Other improvements

- Replace copy/paste import handling with `import_optional` utility function (15906)
- Reorganize from\_iter and dispatch to collect\_ca when possible (15904)
- More PyO3 0.21 bound APIs (15872)
- Improve type-coercion (15879)
- Move type coercion to IR conversion phase (15868)
- Fix Python test coverage upload (15853)
- More upgrades to PyO3 0.21 Bound\<> APIs (15790)
- Use uv for installing Python dependencies in CI (15848)
- Update benchmark tests (15825)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, NedJWestern, NexVeridian, alexander-beedie, deanm0000, dependabot, dependabot[bot], ion-elgreco, itamarst, jr200, nameexhaustion, orlp, reswqa, ritchie46 and stinodego


py-0.20.22
๐Ÿš€ Performance improvements

- Improved type-inference for `read_excel` and `read_ods`, use calamine engine for `read_ods` (15808)
- Fix quadratic in binview growable same source (15734)
- use two binary searches for equality mask when data is sorted (15702)
- improve filter parallelism (15686)

โœจ Enhancements

- Minor type-inference update for `read_database` (15809)
- Improved type-inference for `read_excel` and `read_ods`, use calamine engine for `read_ods` (15808)
- `dt.truncate` supports broadcasting lhs (15768)
- Expressify `str.json_path_match` (15764)
- raise if `storage_options` is passed to read\_csv but `fsspec` isnt available (15778)
- Support decimal float parsing in CSV (15774)
- Add context trace to `LazyFrame` conversion errors (15761)
- Improve error message when passing invalid input to `lit` (15718)
- Remove outdated join validation checks (15701)

๐Ÿž Bug fixes

- drop-nulls edge case; remove drop-nulls special case (15815)
- ewm\_mean\_by was skipping initial nulls when it was already sorted by "by" column (15812)
- Consult cgroups to determine free memory (15798)
- raise if index count like 2i is used when performing rolling, group\_by\_dynamic, upsample, or other temporal operatios (15751)
- Don't deduplicate sort that has slice pushdown (15784)
- Allow passing files opened by fsspec in `read_parquet` (15770)
- Fix incorrect `is_between` pushdown to `scan_pyarrow_dataset` (15769)
- Handle null index correctly for list take (15737)
- Preserve lexical ordering on concat (15753)
- Remove incorrect unsafe pointer cast for int -> enum (15740)
- pass series name to apply for cut/qcut (15715)
- count of null column shouldn't panic in agg context (15710)
- manual cache (15711)
- Ensure we don't hold onto Mutex when grabbing join tuples (15704)
- allow null dtypes in UDFs if they match the schema (15699)
- Respect join\_null argument for semi/anti joins (15696)
- Ensure we don't hold RwLock when spawning group parallelism in wโ€ฆ (15697)
- Ensure empty with\_columns is a no-op (15694)
- Include predicate in cache state union (15693)
- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ“– Documentation

- Add docstring examples for datetimes (13161) (15804)
- Fix a typo in categorical section of the user guide (15777)
- Fix a docstring mistake for DataType.is\_float (15773)
- Remove incorrect "1i (1 index count)" from some docs methods (15750)
- Add example for `Config.set_tbl_width_chars` (15566)
- Align docstring phrasing in `Series/Expr.dt.truncate/round` (15698)
- Various deprecation docstring improvements (15648)

๐Ÿ› ๏ธ Other improvements

- Always expand horizontal\_any/all (15816)
- Rename decimal\_float to decimal\_comma (15817)
- Split coverage calculation (15780)
- Update readme (15787)
- Start at using new Bound\<> API from PyO3 (15752)
- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
MarcoGorelli, NedJWestern, Robinsane, TobiasDummschat, alexander-beedie, c-peters, dependabot, dependabot[bot], gasmith, henryharbeck, itamarst, kszlim, mbuhidar, nameexhaustion, orlp, reswqa, ritchie46, stinodego and wsyxbcl


rs-0.39.2
๐Ÿš€ Performance improvements

- use two binary searches for equality mask when data is sorted (15702)
- improve filter parallelism (15686)

โœจ Enhancements

- Remove outdated join validation checks (15701)

๐Ÿž Bug fixes

- manual cache (15711)
- Ensure we don't hold onto Mutex when grabbing join tuples (15704)
- allow null dtypes in UDFs if they match the schema (15699)
- Respect join\_null argument for semi/anti joins (15696)
- Ensure we don't hold RwLock when spawning group parallelism in wโ€ฆ (15697)
- Ensure empty with\_columns is a no-op (15694)
- Include predicate in cache state union (15693)
- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ› ๏ธ Other improvements

- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
henryharbeck, kszlim, orlp, reswqa and ritchie46


py-0.20.22-rc.1
๐Ÿš€ Performance improvements

- improve filter parallelism (15686)

๐Ÿž Bug fixes

- Add the missing feature flag for `ewm_mean_by` (15687)
- 8/16-bits int could also apply in place for log expr (15680)
- `prepare_expression_for_context` shouldn't panic if exceptions raised from optimizer (15681)

๐Ÿ“– Documentation

- Various deprecation docstring improvements (15648)

๐Ÿ› ๏ธ Other improvements

- Make `json_path_match` expr non-anonymous (15682)

Thank you to all our contributors for making this release possible!
henryharbeck, reswqa and ritchie46


rs-0.39.1
๐Ÿš€ Performance improvements

- Fix regression that led to using only a single thread (15667)

โœจ Enhancements

- add ewm\_mean\_by (15638)

๐Ÿž Bug fixes

- Ensure profile of simple-projection only take own runtime (15671)
- Panic if invalid array in object (15664)
- Ensure 'CachedSchema' doesn't get synced between plans (15661)
- `group_by` multiple null columns produce phantom row (15659)
- rolling\_\* aggs were behaving as if they return scalars in group-by (15657)
- Correct the unsoundness slice range of `arr.min/max` (15654)
- `list.mean` fast path shouldn't produce NaN (15652)
- Fix Display implementation of Duration (15647)

๐Ÿ“– Documentation

- Fix typo in legacy install instructions (15662)
- Include prelude import in the example (15633)

๐Ÿ› ๏ธ Other improvements

- Remove the remaining usage of deprecated `numpy` crate APIs (15668)
- make Duration.is\_constant\_duration less strict for non-timezone-aware case (15639)
- Fix some typos in comments (15665)
- remove unnecessary unsafe in list mean/sum (15660)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Priyansh4444, StevenMia, itamarst, mcrumiller, orlp, reswqa, ritchie46 and stinodego


py-0.20.21
๐Ÿš€ Performance improvements

- Fix regression that led to using only a single thread (15667)

โœจ Enhancements

- add ewm\_mean\_by (15638)

๐Ÿž Bug fixes

- Ensure profile of simple-projection only take own runtime (15671)
- Panic if invalid array in object (15664)
- Ensure 'CachedSchema' doesn't get synced between plans (15661)
- `group_by` multiple null columns produce phantom row (15659)
- rolling\_\* aggs were behaving as if they return scalars in group-by (15657)
- Correct the unsoundness slice range of `arr.min/max` (15654)
- `list.mean` fast path shouldn't produce NaN (15652)

๐Ÿ“– Documentation

- Add missing deprecation warning to `DataFrame.replace` (15612)
- Fix typo in legacy install instructions (15662)

๐Ÿ› ๏ธ Other improvements

- make Duration.is\_constant\_duration less strict for non-timezone-aware case (15639)
- Fix some typos in comments (15665)
- remove unnecessary unsafe in list mean/sum (15660)
- fixup failing test due to `offset` deprecation in `upsample` (15636)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Priyansh4444, StevenMia, eitsupi, itamarst, mcrumiller, orlp, reswqa, ritchie46 and stinodego


rs-0.39.0
๐Ÿ† Highlights

- Full plan CSE (15264)

๐Ÿ’ฅ Breaking changes

- rename `memmap` -> `memory_map` as like Python (15642)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- Update the argument name from `dims` to `dimensions` in `reshape` (15561)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Rename `Chunk` to `RecordBatch` (15298)
- Refactor AnyValue supertype logic (15280)
- Rename `group_by_rolling` to `rolling` and improve related error messages (14765)
- Rename `ChunkedArray.try_apply` to `try_apply_values` (14947)
- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)

๐Ÿš€ Performance improvements

- Fix cross join batch size when one of the DataFrames is tiny (14347)
- Fix binview growable complexity O(n\*m) -> O(n) (15628)
- Remove extra thread spawn from row group fetcher (15626)
- Use vertical parallelism if input is chunked for `Filter`,`Select`,`WithColumns` (15608)
- read\_ipc memory usage tests, and writing fix (15599)
- Refactor CSV serialization to not go thorough `AnyValue` (15576)
- don't use dynamic dispatch in visitors (15607)
- Improve Bitmap construction performance (15570)
- join by row-encoding (15559)
- Replace std::thread spawn with tokio block\_in\_place (15517)
- speed up offset\_by when a single offset is passed (15493)
- Avoid allocation in the hot path for struct JSON serialization (15449)
- avoid double-allocation in rolling\_apply\_agg\_window (15423)
- Make LogicalPlan immutable (15416)
- Add non-order preserving variable row-encoding (15414)
- Use row-encoding for multiple key group by (15392)
- load bits one word at a time for BitmapIter (15333)
- Ipc exec multiple paths (15040)
- add SIMD support for if-then-else kernels (15131)

โœจ Enhancements

- add Expr.dt.add\_business\_days and Series.dt.add\_business\_days (15595)
- Add `str.head` and `str.tail` (14425)
- Extended `BytecodeParser` to handle additional math functions, and imports from the global namespace (15627)
- Push down `is_between` expressions to Arrow (15180)
- add holidays argument to business\_day\_count (15580)
- change default to write parquet statistics (15597)
- Expressify `to_integer` (15604)
- Optimizer; remove double SORT and redundant projections (15573)
- Add `null_on_oob` parameter to `expr.array.get` (15426)
- support weekend argument in business\_day\_count (15544)
- Enable `is_first/last_distinct` for not nested non-numeric list (15552)
- Turn off cse if cache node found (15554)
- Tag concat list as elementwise (15545)
- Support list group-by of non numeric lists (15540)
- add business\_day\_count function (15512)
- Add SQL support for `MEDIAN` aggfunc (15519)
- Implement `string`, `boolean` and `binary` dtype in `top_k` (15488)
- Add SQL support for `TRUNCATE TABLE` command (15513)
- Add SQL support for `GREATEST` and `LEAST` (15511)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Implements `agg_list` for `NullChunked` (15439)
- Supports `explode_by_offsets` for decimal (15417)
- Add `null_on_oob` parameter to `expr.list.get` (15395)
- CSV-writer escape carriage return (15399)
- Remove 'FileCacher' optimization (15357)
- check input type in entropy (15351)
- Implements `arr.n_unique` (15296)
- CSE don't scan share if predicate pushdown predicates don't match (15328)
- Remove cached nodes when finished (15310)
- Full plan CSE (15264)
- Add IR for expressions. (15168)
- Warn if `map_elements` is called without `return_dtype` specified (15188)
- Rename `group_by_rolling` to `rolling` and improve related error messages (14765)
- Rename `ChunkedArray.try_apply` to `try_apply_values` (14947)
- Implement strict AnyValue construction for temporal types (15146)

๐Ÿž Bug fixes

- Return appropriate data type for time `mean` and `median` (14471)
- Support index upsampling (13621)
- Fix issue in `write_excel` that could lead to incorrect spanning range determination (15631)
- Output correct dtype for `mean_horizontal` on a single column (15118)
- Recompute RowIndex schema after projection pd (15625)
- Mean of boolean in streaming group\_by incorrectly always gave NULL (15616)
- Include cloud creds in cache key (15609)
- Fix elementwise-apply if any input is `AggregatedScalar` (15606)
- Explode list should take validity into account (15572)
- use larger recursive stack in debug mode (15593)
- SQL interface "off-by-one' indexing error with `GROUP BY` clauses that use position ordinals (15584)
- Enable missing features in polars-time (15558)
- Handle quoted identifiers when registering CTEs in the SQL engine (15564)
- Decompress moved out of schema initialization (15550)
- Turn off cse if cache node found (15554)
- Resolve function names and prune all aliases. (15522)
- `list.get` should take validity into account (15516)
- block decimal in streaming (15520)
- `group_by` partitioned with literal `Series` panic (15487)
- Initialize validity for `GroupsProxy::Slice` windows (15509)
- Fix struct name resolving (15507)
- `pow` return type evaluation (15506)
- Allow selectors inside frame-level `.filter()` (15445)
- Don't prune alias in AnonymousFunction subtree (15453)
- Fix deadlock in async parquet scan (15440)
- datetime operations (e.g. .dt.year) were raising when null values were backed by out-of-range integers (15420)
- Ensure Binary -> Binview cast doesn't overflow the buffer size (15408)
- Don't prune alias in function subtree (15406)
- Return 0 for `n_unique()` in group-by context when group is empty (15289)
- Unset UpdateGroups after group-sensitive expression (15400)
- `to_any_value` should supports all LiteralValue type (15387)
- Hash failure combining hash of two numeric columns containing equal values (15397)
- Add FixedSizeBinary to arrow field conversion (15389)
- Conversion of expr\_ir in partition fast path (15388)
- `sort` for series with unsupported dtype should raise instead of panic (15385)
- Return correct dtype for `s.clear()` when dtype is `Object` (15315)
- ensure first datapoint is always included in group\_by\_dynamic (15312)
- Non-exhaustive patterns: arrow-schema::DataType in polars-arrow (15250)
- use dynamic stacks for problematic recursive functions (15355)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Fix cache dot visualization (15311)
- Properly propagate `strict` flag when constructing a Struct Series from any values (15302)
- ensure `eq` for `BinaryViewArray` checks all elements (15268)
- Raise when join projects name with suffix that doesn't exist (15256)
- fix kurtosis/skew (15137)
- Ensure ooc\_start is set (15255)
- Fix bug where rolling operations were ignoring `check_sorted` in some cases (15227)
- Fix lazy schema for `rle` expression (15248)
- incorrect negative offset in multi-byte string slicing (15140)
- do not clamp negative offsets to start of array prematurely (15242)
- allow null index in list.get and array.get (15239)
- properly support nulls\_last + descending (15212)
- Block rounding/truncating to negative durations (15175)
- Make parse\_url work on windows with object\_store (15191)
- divide by zero in download speed computation (15182)

๐Ÿ“– Documentation

- Add legacy CPU install instructions in user guide (13676)
- Various minor updates to User Guide's SQL intro section (15557)
- Add `outer_coalesce` join strategy in the user guide (15405)
- Improve docs for `Series::new` with `AnyValue` input (15306)
- Fix formatting in `Series::from_any_values_and_dtype` docs (15244)
- Correct the definition of an expression in the user guide (14750)

๐Ÿ“ฆ Build system

- Fix a feature gate for `lz4` compression in `polars-parquet` (15565)
- Update Rust toolchain (15353)

๐Ÿ› ๏ธ Other improvements

- rename `memmap` -> `memory_map` as like Python (15642)
- fixup failing test due to `offset` deprecation in `upsample` (15636)
- use bound api (15630)
- Don't run streaming group-by in partitionable gb (15611)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- remove try\_binary\_elementwise\_values (15592)
- remove raw pointers from visitors. (15579)
- rename to IR (15571)
- Update the argument name from `dims` to `dimensions` in `reshape` (15561)
- Rename ALogicalPlan to FullAccessIR (15553)
- Set up CodSpeed (15537)
- make dsl immutable and cheap to clone (15394)
- use recursive crate, add missing recursive tag (15393)
- Update CODEOWNERS (polars-sql) (15384)
- Update Rust toolchain (15353)
- Update CODEOWNERS (15352)
- remove try\_apply\_values (15336)
- always use non-legacy float\_sum for mean (15343)
- remove legacy bitmap module (15335)
- More clippy in Makefile (15340)
- Rename `Cache[count]` to `Cache[cache_hits]` (15300)
- Cleanup file\_caching optimization call (15299)
- Rename `Chunk` to `RecordBatch` (15298)
- Refactor AnyValue supertype logic (15280)
- reuse message parsing in IPC (15265)
- remove 'fast-projection' node (15253)
- cleanup column names in optimizer (15252)
- remove left\_most\_input\_name from expr ir (15251)
- add AlignedBitmapSlice (15171)
- Refactor AnyValue construction for Categorical/Enum dtype (15220)
- Move ConsecutiveCountState into support module (15186)
- Run non-benchmark tests in benchmark workflow (15207)
- Add `wrapping_abs` to arithmetic kernel (15210)
- remove raw buffers from BinViewArray (15206)
- Enable `RUST_BACKTRACE=1` in the CI test suite (15204)
- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)
- Set dual license for `polars-arrow` and `polars-parquet` (15173)
- remove parts of legacy bit\_util (15169)
- remove legacy arrow compute (15164)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, Fokko, JamesCE2001, MarcoGorelli, NedJWestern, Sol-Hee, TrevorWinstral, alexander-beedie, braaannigan, c-peters, cmdlineluser, cojmeister, deanm0000, dependabot, dependabot[bot], douglas-raillard-arm, eitsupi, filabrazilska, henryharbeck, i-aki-y, itamarst, kszlim, leoforney, mbuhidar, mcrumiller, mickvangelderen, nameexhaustion, orlp, ozgrakkurt, petrosbar, reswqa, ritchie46, rob-sil, sportfloh, stinodego, thomaslin2020 and yutannihilation


py-0.20.20
๐Ÿš€ Performance improvements

- Fix cross join batch size when one of the DataFrames is tiny (14347)
- Fix binview growable complexity O(n\*m) -> O(n) (15628)
- Remove extra thread spawn from row group fetcher (15626)
- Use vertical parallelism if input is chunked for `Filter`,`Select`,`WithColumns` (15608)
- Refactor CSV serialization to not go thorough `AnyValue` (15576)
- don't use dynamic dispatch in visitors (15607)
- Improve Bitmap construction performance (15570)
- join by row-encoding (15559)

โœจ Enhancements

- add Expr.dt.add\_business\_days and Series.dt.add\_business\_days (15595)
- Add `str.head` and `str.tail` (14425)
- Add `union`/`or` operator for `pl.Enum` (14965)
- Extended `BytecodeParser` to handle additional math functions, and imports from the global namespace (15627)
- Push down `is_between` expressions to Arrow (15180)
- add holidays argument to business\_day\_count (15580)
- change default to write parquet statistics (15597)
- Expressify `to_integer` (15604)
- Optimizer; remove double SORT and redundant projections (15573)
- Add `null_on_oob` parameter to `expr.array.get` (15426)
- support weekend argument in business\_day\_count (15544)
- Enable `is_first/last_distinct` for not nested non-numeric list (15552)
- Turn off cse if cache node found (15554)
- Tag concat list as elementwise (15545)

๐Ÿž Bug fixes

- Return appropriate data type for time `mean` and `median` (14471)
- Fix issue in `write_excel` that could lead to incorrect spanning range determination (15631)
- Output correct dtype for `mean_horizontal` on a single column (15118)
- Recompute RowIndex schema after projection pd (15625)
- Mean of boolean in streaming group\_by incorrectly always gave NULL (15616)
- Include cloud creds in cache key (15609)
- Fix elementwise-apply if any input is `AggregatedScalar` (15606)
- Explode list should take validity into account (15572)
- use larger recursive stack in debug mode (15593)
- SQL interface "off-by-one' indexing error with `GROUP BY` clauses that use position ordinals (15584)
- Enable missing features in polars-time (15558)
- Handle quoted identifiers when registering CTEs in the SQL engine (15564)
- Decompress moved out of schema initialization (15550)
- Turn off cse if cache node found (15554)

๐Ÿ“– Documentation

- Add legacy CPU install instructions in user guide (13676)
- Examples for errors (13724)
- Add docstring examples for reading json (14481)
- Add security warning in LazyFrame.deserialize() docstring (15282)
- Various minor updates to User Guide's SQL intro section (15557)

๐Ÿ› ๏ธ Other improvements

- Replace most deprecated calls with bounded version (15632)
- use bound api (15630)
- Initial PyO3 0.21 support (15622)
- Don't run streaming group-by in partitionable gb (15611)
- pref(rust!, python): Unify `sort` with `SortOptions` and `SortMultipleOptions` (15590)
- Set up CodSpeed (15537)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, Fokko, JamesCE2001, MarcoGorelli, NedJWestern, TrevorWinstral, alexander-beedie, deanm0000, douglas-raillard-arm, eitsupi, filabrazilska, i-aki-y, itamarst, leoforney, mcrumiller, nameexhaustion, orlp, ozgrakkurt, reswqa, ritchie46 and stinodego


py-0.20.19
๐Ÿš€ Performance improvements

- Replace std::thread spawn with tokio block\_in\_place (15517)
- speed up offset\_by when a single offset is passed (15493)
- Avoid allocation in the hot path for struct JSON serialization (15449)

โœจ Enhancements

- Support list group-by of non numeric lists (15540)
- add business\_day\_count function (15512)
- Add SQL support for `MEDIAN` aggfunc (15519)
- Implement `string`, `boolean` and `binary` dtype in `top_k` (15488)
- Add SQL support for `TRUNCATE TABLE` command (15513)
- Add SQL support for `GREATEST` and `LEAST` (15511)
- Allow specifying Hive schema in `read/scan_parquet` (15434)
- Implements `agg_list` for `NullChunked` (15439)

๐Ÿž Bug fixes

- dot product of two integer series is cast to float (15502)
- Resolve function names and prune all aliases. (15522)
- Pass `skip_rows_after_header` to pyarrow csv reader (15533)
- No longer error when `schema_overrides` contains nonexistent columns (15528)
- `list.get` should take validity into account (15516)
- block decimal in streaming (15520)
- `group_by` partitioned with literal `Series` panic (15487)
- Initialize validity for `GroupsProxy::Slice` windows (15509)
- Fix struct name resolving (15507)
- `pow` return type evaluation (15506)
- Address issue with `read_database` draining iter\_batches early (15504)
- Allow selectors inside frame-level `.filter()` (15445)
- Don't prune alias in AnonymousFunction subtree (15453)
- Raise if pass a negative `n` into `clear` (15432)
- Fix deadlock in async parquet scan (15440)

๐Ÿ“– Documentation

- Update leftover references of `by` parameter to `group_by` in `DataFrame/LazyFrame.upsample/group_by_dynamic/rolling` (15527)
- Add `make docs` command, DataType docs/layout tweak, minor README updates (15386)
- Add example for `Series.list.median`. (15451)

๐Ÿ› ๏ธ Other improvements

- Remove unused code paths in `read_parquet` (15532)
- Organize utils for I/O functionality (15529)
- Remove private `DataFrame._read` classmethods (15521)
- Move dedicated inference code out of `io.database` executor module (15526)
- Add unstable warning to `hive_schema` functionality (15508)

Thank you to all our contributors for making this release possible!
CanglongCl, ChayimFriedman2, MarcoGorelli, alexander-beedie, cmdlineluser, dependabot, dependabot[bot], henryharbeck, mbuhidar, nameexhaustion, reswqa, ritchie46, rob-sil and stinodego


py-0.20.18
๐Ÿš€ Performance improvements

- CSV reading memory usage tests and fixes (15422)
- avoid double-allocation in rolling\_apply\_agg\_window (15423)
- Make LogicalPlan immutable (15416)
- Add non-order preserving variable row-encoding (15414)
- Use row-encoding for multiple key group by (15392)

โœจ Enhancements

- Supports `explode_by_offsets` for decimal (15417)
- Add `read_clipboard` and `DataFrame.write_clipboard` (15272)
- Add `null_on_oob` parameter to `expr.list.get` (15395)
- make Series.\_\_bool\_\_ error message Rusttier (15407)
- CSV-writer escape carriage return (15399)

๐Ÿž Bug fixes

- datetime operations (e.g. .dt.year) were raising when null values were backed by out-of-range integers (15420)
- Ensure Binary -> Binview cast doesn't overflow the buffer size (15408)
- Don't prune alias in function subtree (15406)
- Return 0 for `n_unique()` in group-by context when group is empty (15289)
- Unset UpdateGroups after group-sensitive expression (15400)
- `to_any_value` should supports all LiteralValue type (15387)
- Hash failure combining hash of two numeric columns containing equal values (15397)
- Add FixedSizeBinary to arrow field conversion (15389)
- Conversion of expr\_ir in partition fast path (15388)
- fix panic when doing a scan\_parquet with hive partioning (15381)
- `sort` for series with unsupported dtype should raise instead of panic (15385)

๐Ÿ“– Documentation

- Added example for `explode` mapping strategy in `pl.Expr.over` (15402)
- Add `outer_coalesce` join strategy in the user guide (15405)
- Change the example to series for `series/array.py` (15383)
- Add "See Also" for `arg_sort` and `arg_sort_by` (15348)

๐Ÿ› ๏ธ Other improvements

- make dsl immutable and cheap to clone (15394)
- use recursive crate, add missing recursive tag (15393)
- Update CODEOWNERS (polars-sql) (15384)

Thank you to all our contributors for making this release possible!
CanglongCl, JamesCE2001, MarcoGorelli, Sol-Hee, alexander-beedie, dependabot, dependabot[bot], itamarst, kszlim, mcrumiller, nameexhaustion, orlp, reswqa, ritchie46, rob-sil and thomaslin2020


py-0.20.17
๐Ÿ† Highlights

- Full plan CSE (15264)

โš ๏ธ Deprecations

- Rename parameter `by` to `group_by` in `DataFrame.upsample/group_by_dynamic/rolling` (14840)
- Rename `from_repr` parameter from `tbl` to `data` (15156)

๐Ÿš€ Performance improvements

- load bits one word at a time for BitmapIter (15333)
- Ipc exec multiple paths (15040)
- add SIMD support for if-then-else kernels (15131)

โœจ Enhancements

- Remove 'FileCacher' optimization (15357)
- check input type in entropy (15351)
- Implements `arr.n_unique` (15296)
- CSE don't scan share if predicate pushdown predicates don't match (15328)
- Add `read_database` support for `SurrealDB` ("ws" and "http") (15269)
- Only allow inputs of type `Sequence` in `from_records` (15329)
- In hypothesis testing strategies, enable Decimal strategy by default (15321)
- Remove cached nodes when finished (15310)
- Full plan CSE (15264)
- More robust handling of `async` database calls (15202)
- Add `name` parameter to `GroupBy.len` method (15235)
- Add IR for expressions. (15168)
- Improve `read_database` when reading from Kรนzu graph database (15218)
- Warn if `map_elements` is called without `return_dtype` specified (15188)
- Add support for `async` SQLAlchemy connections to `read_database` (15162)
- Infer `time_unit` in `pl.duration` when nanoseconds is specified (14987)
- Add `strict` parameter to `from_dict/from_records` (15158)

๐Ÿž Bug fixes

- Return correct dtype for `s.clear()` when dtype is `Object` (15315)
- ensure first datapoint is always included in group\_by\_dynamic (15312)
- Non-exhaustive patterns: arrow-schema::DataType in polars-arrow (15250)
- use dynamic stacks for problematic recursive functions (15355)
- Adding default ddof for `Series.list.std` and `Series.list.var` (15267)
- Raise properly for slices not supported by `LazyFrame` (15331)
- Propagate strictness in `from_dicts` (15344)
- Raise error when `schema_overrides` contains nonexistent column name (15290)
- Enforce integer `dtype` input for `int_range` and `int_ranges` (15339)
- Preserve Decimal precision when constructing empty Series (15320)
- Fix cache dot visualization (15311)
- Handle special case correctly when slicing a `LazyFrame` (15297)
- Properly propagate `strict` flag when constructing a Struct Series from any values (15302)
- Consistent expansion of nested struct data during `DataFrame` init from dict (15217)
- Raise when join projects name with suffix that doesn't exist (15256)
- Ensure ooc\_start is set (15255)
- Fix bug where rolling operations were ignoring `check_sorted` in some cases (15227)
- Fix lazy schema for `rle` expression (15248)
- incorrect negative offset in multi-byte string slicing (15140)
- do not clamp negative offsets to start of array prematurely (15242)
- allow null index in list.get and array.get (15239)
- Avoid loading all columns in `read_parquet` when `columns` parameter is specified (15229)
- properly support nulls\_last + descending (15212)
- fix nested runtime panic (15216)
- Block rounding/truncating to negative durations (15175)
- Ensure the `cs.temporal()` selector uses wildcard time zone matching for `Datetime` (13683)
- Consistently raise `TypeError` on constructor failure (15178)
- Properly propagate strictness in some constructor cases (15166)
- Fix constructing a Series from a list of Series with given dtype (15144)

๐Ÿ“– Documentation

- Fix time unit in `timestamp` example (15281)
- Fix link to renamed method (.list.lengths -> .list.len) (15228)
- Update Excel and database pages in user guide (14721)
- Add examples for `Series.search_sorted` (14737)
- Correct the definition of an expression in the user guide (14750)
- Add a note about the behaviour of lower/upper bounds for `is_between`, and add an example (15197)

๐Ÿ“ฆ Build system

- Update Cargo lock (15370)

๐Ÿ› ๏ธ Other improvements

- Memory usage test infrastructure, plus a test for 15098 (15285)
- Update CODEOWNERS (15352)
- remove try\_apply\_values (15336)
- always use non-legacy float\_sum for mean (15343)
- remove legacy bitmap module (15335)
- Fix test not writing to temporary directory (15318)
- Reorganize tests for `clear` operation (15304)
- Rename `Cache[count]` to `Cache[cache_hits]` (15300)
- Cleanup file\_caching optimization call (15299)
- Minor refactor of `PyDataFrame.from_dicts` (15274)
- remove 'fast-projection' node (15253)
- cleanup column names in optimizer (15252)
- remove left\_most\_input\_name from expr ir (15251)
- add AlignedBitmapSlice (15171)
- Run non-benchmark tests in benchmark workflow (15207)
- Add `wrapping_abs` to arithmetic kernel (15210)
- remove raw buffers from BinViewArray (15206)
- Enable `RUST_BACKTRACE=1` in the CI test suite (15204)
- Split `read_database` functionality into cleaner module structure (15201)
- Clean up some of the AnyValue conversion logic (15190)
- remove parts of legacy bit\_util (15169)
- remove legacy arrow compute (15164)
- Split up `dataframe` module in PyO3 bindings (15165)
- Remove unused private constructors (15160)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, braaannigan, c-peters, cojmeister, deanm0000, dependabot, dependabot[bot], itamarst, kszlim, mbuhidar, mcrumiller, mickvangelderen, orlp, petrosbar, reswqa, ritchie46, rob-sil, sportfloh, stinodego and yutannihilation


rs-0.38.3
๐Ÿš€ Performance improvements

- add new when-then-otherwise kernels (15089)
- Coerce sorted flag of unit arrays during concat (15104)
- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- raise if both `closed` and `by` are passed to `rolling_*` aggregations (15108)
- raise informative error for rolling\_\* aggs with `by` of invalid dtype (15088)
- add `non_existent` arg to `replace_time_zone` (15062)
- Support single nested row encodings (15105)
- make ooc sort configurable (15084)
- Make `register_plugin` a standalone function and include shared lib discovery (14804)
- Async parquet: Decode parquet on a blocking thread pool (15083)
- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Return error when no supertype can be determined in AnyValue constructor when `strict=false` (15025)
- Implement IpcReaderAsync (14984)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Fix Series construction from nested list with mixed data types (15046)
- Support BinaryView in row decoder to prevent a panic in streaming group by (15117)
- Binview chunked gather; don't modify inlined view (15124)
- Fix chunked\_id gather for binview buffers (15123)
- Don't cache HTTP object stores as they maintain URL state (15121)
- use wrapping\_add in csv line snooping (15109)
- Output `u32` when `sum_horizontal` provided with single boolean column (15114)
- Ensure `eprintln!` is only called within debug/verbose context (15100)
- Propagate error instead of panicking when calling `product` on an invalid type (15093)
- Raise error when casting Array to different width (14995)
- Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (15065)
- Incorrectly preserved sorted flag when concatenating sorted series containing nulls (15082)
- Return largest non-NaN value for `max()` on sorted float arrays if it exists instead of NaN (15060)
- return NaN for all-NaN min/max (15066)
- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Strict cast in when/then/otherwise operation (15052)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Fix typo in comment (14997)

๐Ÿ› ๏ธ Other improvements

- Extend and speed up scan tests (15127)
- always assert on ChunkedArray::get (15120)
- Use ObjectStore instead of AsyncRead in parquet get metadata (15069)
- Minor refactor of Rust any value constructors (15077)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- Apply `clippy:assigning_clones` lint (14999)
- fix features (14977)

Thank you to all our contributors for making this release possible!
JackRolfe, MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46, stinodego and trueb2


py-0.20.16
๐Ÿš€ Performance improvements

- add new when-then-otherwise kernels (15089)
- Coerce sorted flag of unit arrays during concat (15104)
- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- improved dtype inference/refinement for `read_database` results (15126)
- raise if both `closed` and `by` are passed to `rolling_*` aggregations (15108)
- raise informative error for rolling\_\* aggs with `by` of invalid dtype (15088)
- add `non_existent` arg to `replace_time_zone` (15062)
- Support single nested row encodings (15105)
- make ooc sort configurable (15084)
- Make `register_plugin` a standalone function and include shared lib discovery (14804)
- Expose `infer_schema_length` parameter on `read_database` (15076)
- Async parquet: Decode parquet on a blocking thread pool (15083)
- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Add `strict` parameter to `DataFrame` constructor to allow non-strict construction (15034)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Support BinaryView in row decoder to prevent a panic in streaming group by (15117)
- Binview chunked gather; don't modify inlined view (15124)
- Fix chunked\_id gather for binview buffers (15123)
- Don't cache HTTP object stores as they maintain URL state (15121)
- Output `u32` when `sum_horizontal` provided with single boolean column (15114)
- Propagate error instead of panicking when calling `product` on an invalid type (15093)
- Raise error when casting Array to different width (14995)
- Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (15065)
- Incorrectly preserved sorted flag when concatenating sorted series containing nulls (15082)
- Return largest non-NaN value for `max()` on sorted float arrays if it exists instead of NaN (15060)
- return NaN for all-NaN min/max (15066)
- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Update write\_database code blocks in user guide (15106)
- Add missing docstring examples in the Struct namespace (15071)
- Improve API reference landing page (14888)
- improve join\_asof example (14993)
- Fix inadvertent swap of `new` and `old` parameters in `replace` description (15019)

๐Ÿ› ๏ธ Other improvements

- Extend and speed up scan tests (15127)
- Add parameterized-scan-tests (15057)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- fix features (14977)
- Revert pinning PyPI publish action (14975)

Thank you to all our contributors for making this release possible!
JackRolfe, MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46, stinodego and trueb2


py-0.20.16-rc.1
๐Ÿš€ Performance improvements

- Use sorted flag for `(first|last)_non_null` (15050)
- OOC sort improvements (14994)

โœจ Enhancements

- let "ambiguous" take "null" value (14961)
- Raise informative error message when join would introduce duplicate column name (15042)
- Allow cast of decimal to boolean (15015)
- Add `strict` parameter to `DataFrame` constructor to allow non-strict construction (15034)
- Support Array statistics in parquet (15031)
- Support decimal groupby (15000)
- Add thread names to rayon thread pool (15024)
- Support decimal uniq (15001)
- expose timings in verbose state of OOC sort (14979)

๐Ÿž Bug fixes

- Prevent "index out of range for slice" error in parquet reader (15021)
- Respect `nulls_last` in streaming sort (15061)
- Fix Series construction from nested list with mixed data types (15046)
- Don't count nulls in streaming `count` agg (15051)
- agg\_list on decimal lost scale (15054)
- Block predicate pushdown on equality that are use in join (15055)
- Enum equality based on categories (15053)
- Don't panic in `string_addition_to_linear_concat` (15006)
- CSV do utf8-validation after escaping fields (15004)
- Use primitive constructors to create a Series of lists when dtype is provided (15002)
- replace\_time\_zone with single-null-element "ambiguous" was panicking (14971)

๐Ÿ“– Documentation

- Improve API reference landing page (14888)
- improve join\_asof example (14993)
- Fix inadvertent swap of `new` and `old` parameters in `replace` description (15019)

๐Ÿ› ๏ธ Other improvements

- Add parameterized-scan-tests (15057)
- Simplify streaming execution (15039)
- Ensure we hit the spilled source path in ooc sort test (15010)
- Refactor constructor code (15009)
- fix features (14977)
- Revert pinning PyPI publish action (14975)

Thank you to all our contributors for making this release possible!
MKisilyov, MarcoGorelli, alexander-beedie, c-peters, flisky, jqnatividad, mcrumiller, mickvangelderen, nameexhaustion, petrosbar, ritchie46, stinodego and trueb2


rs-0.38.2
๐Ÿ† Highlights

- Streaming outer joins (14828)

๐Ÿš€ Performance improvements

- Ensure parallel encoding/compression in `sink_parquet` (14964)
- hoist errors out of iterators in parquet (14945)
- add basic AVX-512 filters (14892)
- improve join-asof materialization (14884)
- Optimize chunked-id gather for binaryviews (14878)
- rework scalar filter kernels (14865)
- Reduce size of optional join-indexes (14856)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)

โœจ Enhancements

- Support writing `Array` type in parquet (14943)
- Sort decimal fields (14649)
- Import `NamedFrom` in `df!` macro (14860)
- try-improve concurrency tuner (14827)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)
- Ensure binview types are rle-encoded in parquet write (14818)
- Implement strict/nonstrict conversion for primitive AnyValues (14186)
- Disable timeouts (14809)
- cleanup spill disks in process (14807)

๐Ÿž Bug fixes

- Fix invalid partitionable query (14966)
- allow nonstrict cast of categorical/enum to enum (14910)
- `count_rows` multi-threaded under-counting in parser.rs (14963)
- raise proper error instead of panicking when result of truncation is non-existent datetime (14958)
- ooc-sort issues (14959)
- Do not raise when constructing from a list of Series with Nones (14942)
- Don't access out-of-bounds for null indices in bitmap gather (14932)
- std when ddof>=n\_values returns None even in rolling context (11750)
- Don't rechunk categoricals when moving to physical (14934)
- parquet rle boolean decoder (14931)
- boolean filter gave overly large buffers to Bitmap::from\_u8\_vec (14924)
- Fix sliced dictionary state in parquet (14917)
- Fix possibly incorrect order of columns when using ipc stream `with_columns` (14859)
- Fully qualify `polars_bail!` in `polars_ensure!` (14901)
- Fix `DataFrame.min`/`max` for decimals (14890)
- Assert chunks are equal after physical cast to prevent OOB (14873)
- not all cpu feature flag tests were mocked (14864)

๐Ÿ“– Documentation

- Remove some repetition in comments/docstrings (14912)
- Update contributing link (14882)
- Fix some word-repetition in code comments (14825)
- Seperate `asof` from join strategy, change parameter from `strategy` to `how` in user guide (14793)

๐Ÿ› ๏ธ Other improvements

- fix features (14977)
- fix chrono deprecation warnings (14928)
- Update Cargo.lock and remove cmake limit workaround (14905)
- Simplify streaming placeholder replacement. (14915)
- Optional deps should include `fastexcel` (14907)
- Deduplicate `POLARS_FORCE_ASYNC` env var parsing (14909)
- Make assumption about column name to index conversion having occurred explicit (14894)
- Make assumption about wildcards having been resolved explicit (14899)
- reactivate argminmax simd (14679)
- sort by 'idx' after outer join (14867)
- Simplify computation of `with_columns` attribute in physical csv scanner of default engine. (14837)
- centrally define IdxSize (14854)
- run and fix pext64\_polyfill test (14852)
- introduce partitioned table (14819)
- add missing deprecation directive in groupby.count (14817)
- Extract key value construction (14812)
- Fix Makefile build commands (14806)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Sol-Hee, alexander-beedie, ambidextrous, battmdpkq, deanm0000, dependabot, dependabot[bot], eitsupi, flisky, geekvest, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


py-0.20.15
๐Ÿš€ Performance improvements

- Ensure parallel encoding/compression in `sink_parquet` (14964)
- hoist errors out of iterators in parquet (14945)
- add basic AVX-512 filters (14892)

โœจ Enhancements

- Support writing `Array` type in parquet (14943)
- Add `drop_first` parameter to `Series.to_dummies` (14846)
- Add "execute\_options" support for `read_database_uri` (14682)

๐Ÿž Bug fixes

- Fix invalid paritionable query (14966)
- allow nonstrict cast of categorical/enum to enum (14910)
- `count_rows` multi-threaded under-counting in parser.rs (14963)
- raise proper error instead of panicking when result of truncation is non-existent datetime (14958)
- ooc-sort issues (14959)
- Do not raise when constructing from a list of Series with Nones (14942)
- Don't access out-of-bounds for null indices in bitmap gather (14932)
- std when ddof>=n\_values returns None even in rolling context (11750)
- Don't rechunk categoricals when moving to physical (14934)
- Ensure consistent `read_database` behaviour with empty ODBC "iter\_batches" (14918)
- parquet rle boolean decoder (14931)
- Fix frame init from single `RecordBatch` objects when `pyarrow <= 12` (14922)
- boolean filter gave overly large buffers to Bitmap::from\_u8\_vec (14924)
- Fix sliced dictionary state in parquet (14917)
- `read_database` now properly handles empty result sets from `arrow-odbc` (14916)
- Fix possibly incorrect order of columns when using ipc stream `with_columns` (14859)

๐Ÿ“– Documentation

- Add note about `include_index` in `from_pandas` regarding "default indices" (14920)
- Remove some repetition in comments/docstrings (14912)

๐Ÿ› ๏ธ Other improvements

- Update Cargo.lock and remove cmake limit workaround (14905)
- Simplify streaming placeholder replacement. (14915)
- Optional deps should include `fastexcel` (14907)
- Deduplicate `POLARS_FORCE_ASYNC` env var parsing (14909)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ambidextrous, battmdpkq, mcrumiller, mickvangelderen, orlp, petrosbar and ritchie46


py-0.20.14
๐Ÿ† Highlights

- Streaming outer joins (14828)

โš ๏ธ Deprecations

- Deprecate `overwrite_schema` parameter for `DataFrame.write_delta` (14879)

๐Ÿš€ Performance improvements

- improve join-asof materialization (14884)
- Optimize chunked-id gather for binaryviews (14878)
- rework scalar filter kernels (14865)
- Reduce size of optional join-indexes (14856)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)

โœจ Enhancements

- Sort decimal fields (14649)
- Revert addition of `__slots__` to Polars classes (14857)
- Add `fastexcel` to `show_versions` (14869)
- try-improve concurrency tuner (14827)
- Streaming outer joins (14828)
- Set sorted flag for `cum_count` on columns (14849)
- support use of KรนzuDB via `pl.read_database` (14822)
- Ensure binview types are rle-encoded in parquet write (14818)
- Disable timeouts (14809)
- cleanup spill disks in process (14807)
- Implement compression and skipping for binview IPC (14789)

๐Ÿž Bug fixes

- Fix `DataFrame.min`/`max` for decimals (14890)
- Assert chunks are equal after physical cast to prevent OOB (14873)
- not all cpu feature flag tests were mocked (14864)
- Remove custom `__reduce__` implementation on `DataType` object (14778)
- Allow non-strict construction / initialization of Enum columns (14728)
- Fix streaming parquet limit (14783)

๐Ÿ“– Documentation

- Update contributing link (14882)
- update to use `ambiguous` instead of `use_earliest` (14820)
- Seperate `asof` from join strategy, change parameter from `strategy` to `how` in user guide (14793)

๐Ÿ› ๏ธ Other improvements

- Pin PyPI publish action to commit (14896)
- reactivate argminmax simd (14679)
- sort by 'idx' after outer join (14867)
- run and fix pext64\_polyfill test (14852)
- add missing deprecation directive in groupby.count (14817)
- Fix Makefile build commands (14806)
- Bump ruff from 0.2.0 to 0.3.0 in /py-polars (14800)
- Rename `utils` module to `_utils` to explicitly mark it as private (14772)
- Add test coverage for `_cpu_check` module (14768)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Sol-Hee, alexander-beedie, c-peters, deanm0000, dependabot, dependabot[bot], eitsupi, flisky, geekvest, mcrumiller, mickvangelderen, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


rs-0.38.1
๐Ÿš€ Performance improvements

- Elide utf8/binary cast in Parquet reading (14757)

โœจ Enhancements

- Implement compression and skipping for binview IPC (14789)

๐Ÿž Bug fixes

- fix feature flags (14802)
- Allow non-strict construction / initialization of Enum columns (14728)
- Fix streaming parquet limit (14783)

๐Ÿ“ฆ Build system

- bump rayon from 1.8.1 to 1.9.0 (14797)

Thank you to all our contributors for making this release possible!
alexander-beedie, c-peters, dependabot, dependabot[bot], ritchie46 and stinodego


rs-0.38.0
๐Ÿ† Highlights

- fast path for COUNT(\*) queries (14574)
- Implemented tree formatting for LogicalPlan (14221)

๐Ÿ’ฅ Breaking changes

- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- Mark `DataFrame::new_no_checks` and `DataFrame::new_no_length_checks` unsafe (14443)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)

๐Ÿš€ Performance improvements

- auto-tune concurrency budget (14753)
- Don't materialize for broadcasting `fill_null` value and default value of `replace` (14736)
- Improve performance of boolean filters `1-100x`. (14746)
- fix accidental quadratic utf8 validation in parquet (14705)
- fast path for COUNT(\*) queries (14574)
- Elide the total order wrapper for non-(float/option) types (14648)
- add utf8-validation fast paths for utf8view (14644)
- don't reassign chunks back to df owner (14633)
- If there are many small chunks in write\_parquet(), convert to a single chunk (14484) (14487)
- Polars thread pool was not used properly in various functions (14583)
- use owned arithmetic in horizontal\_sum (14525)
- Combine small chunks in sinks for streaming pipelines (14346)
- reduce heap allocs in expression/logical-plan iteration (14440)
- simplify and speed up cum\_sum and cum\_prod (14409)
- simplify negated predicates to improve row groups skipping (14370)
- prune parquet row groups when `is_not_null` is used (14260)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)

โœจ Enhancements

- Change default for maximum number of Series items printed to 10 to match DataFrame (14703)
- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- fast path for COUNT(\*) queries (14574)
- let `rolling` accept `index_column` of type UInt32 or UInt64 (14669)
- Treat float -0.0 == 0.0 and -NaN == NaN in group-by, joins and unique (14617)
- Properly cache object-stores (14598)
- Mark `DataFrame::new_no_checks` and `DataFrame::new_no_length_checks` unsafe (14443)
- flatten aliases (14512)
- Make formatting more consistent in DOT graphs (14486)
- add `flush` operator to streaming operators (14500)
- Increase verbosity of duplicate column error message (11899)
- change print to warn in reading csv from python file like object (14469)
- Raise if `pivot` would introduce duplicate column names (14431)
- apply negate in simplify expression pass (14436)
- restrict more cloud interop to semaphore budget (14435)
- Implement `min`/`max` for categorical dtype (14112)
- add boolean rle decoding for parquet (14403)
- Allow brackets in SQL join conditions (14263)
- Improve panic message for missing struct feature in `DataType::from_arrow` (14392)
- Implement the `IntoLazy` trait for LazyFrame (14323)
- Implemented tree formatting for LogicalPlan (14221)
- Implement `mean_horizontal` expression (14369)
- support decimal comparison (14338)
- Implements `arr.shift` (14298)
- Implements `list.n_unique` (14306)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Polish decimal arithmetic (14172)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- Add strict/non-strict construction of Boolean/Binary series (14073)
- Improve `Series::from_any_values` logic (14052)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)

๐Ÿž Bug fixes

- fix hashing specialization (14754)
- Sum after filter in aggregation context sometimes returned NULL (14752)
- Allow `list.contains()` for list of categoricals (14744)
- Fix bug where alias was ignored in COUNT(\*) optimization (14738)
- Fix `DataFrame.sum` for decimals (14732)
- Fix parallel strategy for LazyFrame not being applied (14696)
- Block slice pushdown past non-literal projections or when the projection doesn't contain any columns from the input (14684)
- Fix number of rows printed in `DataFrame/Series` repr (edge cases) (14548)
- Fix contention panics in file gc threads (14690)
- Fix feature combination (14688)
- Only push predicates depending on the subset columns past `unique()` (14668)
- Reading RLE\_DICTIONARY-encoded parquet incorrectly coalesced NULL to empty string in some cases (14670)
- use correct flooring division/modulo operator in literal optimizer and const\_lhs \<> series ops (14671)
- Enable `is_in` for string in categorical/enum (14576)
- Polars thread pool was not used properly in various functions (14583)
- Semi-join and multiple keys outer-join did not respect POLARS\_MAX\_THREADS (14571)
- Correct sorted flag of chunked gather (14570)
- ensure the streaming dispatcher can replace placeholders in unions (14537)
- Ensure series are contiguous prior to `transpose` (14527)
- write csv header if necessary when finishing sinks (14518)
- fix logical dtypes in take\_chunked (14517)
- fix binary-offset row-encode (14514)
- race conditions in OOC writing (14510)
- don't gc after variadic buffers are written (14473)
- Increase verbosity of duplicate column error message (11899)
- Return appropriate data type for duration `mean` and `median` (14376)
- change print to warn in reading csv from python file like object (14469)
- regression in out-of-core group-by by new string-type (14464)
- DataFrame.pivot was returning incorrect results when multiple columns were passed to `index` and one of them was Struct (14438)
- remove literal `Series` from projection state (14437)
- pivot was producing incorrect results when (single) `index` was Struct (14308)
- Error on some invalid `clip` inputs (14416)
- Series.hist panicking on empty/all-null (14407)
- rechunk series when apply\_lambda (14406)
- don't make column from filenames, don't ignore directories with (.) (14317)
- Remove duplicated content in error messages (8107)
- Fix `set_operation` if the input is sliced and be broadcast (14303)
- Wrap `par_iter` in `list.to_struct` by `POOL.install` (14304)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- Preserve name when casting to Enum (14320)
- `list.get` does not work on list of decimals (14276)
- relax precision when up scaling (14270)
- Allow format object series with registry (14272)
- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Fix join validation for String types (14229)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- add missing conditional compile flag for `StringFunction::Find` (14129)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- correct field type schema inference (using read\_csv) (14042)
- Map `AnyValue::Null` to datatype `Null` (14045)
- Use int formatter for unsigned ints (14043)
- quick fix for multiple chunks binary reverse (14024)
- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)

๐Ÿ“– Documentation

- Link to plugins tutorial more prominently (14727)
- Separate "writing a plugin" from "registering an expression" in user guide, add some extra links, don't use deprecated \_register\_plugin (14621)
- Remove some outdated information in `polars` crate docs (14608)
- Fix code block path for group by example in getting started guide (14612)
- Add missing 'string' column in reading-writing Rust example to match Python example (14597)
- Fix typo of "Cartesian" product (14585)
- Mention in contributing guide that PR titles should start with an uppercase letter (14584)
- Fix markdown newline for rendering function description in VSCode (14567)
- Clarify doc summary of `upsample_stable` (13623)
- Clean up grammar and capitalization in `README.md` (14488)
- Fix typo in plugins section (14402)
- Add debugging section to contributing docs (10576)
- Fix some typos (14394)
- Realign file structure of user guide (14360)
- Rust examples for data structures in user guide (14339)
- Add deprecation period policy example for post-1.0.0 (14184)
- Fix capitalization of user guide references (14291)
- fix code block in user-guide/lazy/schemas (14228)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Fix bullet point formatting in CI contributing guide (14117)
- Remove outdated reference to horizontal concat feature (14105)
- Replace alternatives page with more objective comparison (13784)

๐Ÿ“ฆ Build system

- update ahash (14731)
- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Fix `json` feature for `polars-sql` crate (14501)
- Enable feature nightly with optional sql feature (14222)

๐Ÿ› ๏ธ Other improvements

- update ahash (14731)
- replace transmute with bytemuck cast (14747)
- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Refactor `AnyValue` casting logic (13140)
- update rustc (14678)
- redundant imports all crates (14662)
- remove redundant imports up to polars-io, polars-time, polars-ops (14658)
- remove redundant imports (up until polars-core) (14646)
- Simplify compressed\_chunk\_size calculation and leave comments to explain for rle encode (14634)
- Rename coverage file (14607)
- Format safety sections in Rust docstrings (14446)
- Refactor code coverage workflow (14563)
- Disable status from code coverage (14545)
- Add code coverage CI (14532)
- Format safety comments (14447)
- Bump release drafter to v6 (14429)
- Bump `setup-graphviz` action to v2 (14418)
- Update `make clean` command (14408)
- Minor refactor to satisfy clippy (14364)
- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Enable `clippy` lint to warn on debug macros (14178)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- More generic way to present an expression tree diagram (14020)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)

Thank you to all our contributors for making this release possible!
BGR360, CBell045, CaselIT, FBruzzesi, JulianCologne, Kylea650, MarcoGorelli, Migi, NedJWestern, Object905, Vincenthays, Wainberg, alexander-beedie, apcamargo, braaannigan, bsubei, c-peters, dannyfriar, deanm0000, dependabot, dependabot[bot], dpinol, eLVas, edavisau, eitsupi, engdoreis, flisky, grinya007, i-aki-y, ion-elgreco, itamarst, janosh, jdanford, kalekundert, lukemanley, mbuhidar, mcrumiller, nameexhaustion, orlp, petrosbar, r-brink, rben01, reswqa, rijkvp, ritchie46, stinodego, taki-mekhalfa and thomasfrederikhoeck


py-0.20.13
๐Ÿš€ Performance improvements

- Elide utf8/binary cast in Parquet reading (14757)

๐Ÿž Bug fixes

- Add missing "pclmulqdq" instruction to `_cpu_check` ("read\_cpu\_flags") (14758)

๐Ÿ› ๏ธ Other improvements

- Test release wheels on x86-64 (14761)

Thank you to all our contributors for making this release possible!
alexander-beedie, ritchie46 and stinodego


py-0.20.12
> [!WARNING]
> **This release was deleted from PyPI.** Please use the [0.20.13](https://github.com/pola-rs/polars/releases/tag/py-0.20.13) release instead.

๐Ÿš€ Performance improvements

- auto-tune concurrency budget (14753)
- Don't materialize for broadcasting `fill_null` value and default value of `replace` (14736)
- Improve performance of boolean filters `1-100x`. (14746)

๐Ÿž Bug fixes

- fix hashing specialization (14754)
- Sum after filter in aggregation context sometimes returned NULL (14752)
- Allow `list.contains()` for list of categoricals (14744)
- Fix bug where alias was ignored in COUNT(\*) optimization (14738)
- Fix `DataFrame.sum` for decimals (14732)

๐Ÿ“– Documentation

- Link to plugins tutorial more prominently (14727)

๐Ÿ“ฆ Build system

- update ahash (14731)

๐Ÿ› ๏ธ Other improvements

- update ahash (14731)
- Use `datetime_to_int` util for AnyValue conversion (14743)
- Refactor `utils/convert.py` module (14739)

Thank you to all our contributors for making this release possible!
MarcoGorelli, c-peters, nameexhaustion, orlp, petrosbar, ritchie46 and stinodego


py-0.20.11
> [!WARNING]
> **This release was deleted from PyPI.** Please use the [0.20.13](https://github.com/pola-rs/polars/releases/tag/py-0.20.13) release instead.

๐Ÿ† Highlights

- fast path for COUNT(\*) queries (14574)

โš ๏ธ Deprecations

- Deprecate passing `time_unit=None` to `Datetime` constructor (14708)
- Rename `Expr.meta.write_json/Expr.from_json` to `Expr.meta.serialize/Expr.deserialize` (14490)
- Deprecate default value for `ignore_nulls` for `ewm` methods (14663)
- Deprecate `DataFrame/LazyFrame.approx_n_unique` (14594)

๐Ÿš€ Performance improvements

- 2-3x speedup in creating literals/Series of type `Date` (14716)
- fix accidental quadratic utf8 validation in parquet (14705)
- Add `__slots__` to most Polars classes (13236)
- fast path for COUNT(\*) queries (14574)
- Elide the total order wrapper for non-(float/option) types (14648)
- add utf8-validation fast paths for utf8view (14644)
- don't reassign chunks back to df owner (14633)
- If there are many small chunks in write\_parquet(), convert to a single chunk (14484) (14487)
- Polars thread pool was not used properly in various functions (14583)

โœจ Enhancements

- Change default for maximum number of Series items printed to 10 to match DataFrame (14703)
- Change default number of rows printed in Notebooks for DataFrame/Series to 10 (14536)
- Infer `values` columns in `DataFrame.pivot` when `values` is None (14477)
- fast path for COUNT(\*) queries (14574)
- let `rolling` accept `index_column` of type UInt32 or UInt64 (14669)
- Treat float -0.0 == 0.0 and -NaN == NaN in group-by, joins and unique (14617)
- Improve consistency of `dtype` inference from Python types (14600)
- Properly cache object-stores (14598)

๐Ÿž Bug fixes

- Fix parallel strategy for LazyFrame not being applied (14696)
- Block slice pushdown past non-literal projections or when the projection doesn't contain any columns from the input (14684)
- Fix number of rows printed in `DataFrame/Series` repr (edge cases) (14548)
- Fix contention panics in file gc threads (14690)
- Fix feature combination (14688)
- Only push predicates depending on the subset columns past `unique()` (14668)
- Properly handle a single empty `RecordBatch` in `from_arrow` (14683)
- More accurate type hints for binary file-like inputs (14674)
- Reading RLE\_DICTIONARY-encoded parquet incorrectly coalesced NULL to empty string in some cases (14670)
- use correct flooring division/modulo operator in literal optimizer and const\_lhs \<> series ops (14671)
- Enable `is_in` for string in categorical/enum (14576)
- Fixes a `read_database` issue loading specific datetime types from SQL Server backends (14627)
- Polars thread pool was not used properly in various functions (14583)

๐Ÿ“– Documentation

- Improve some DataType docstrings (14719)
- Fix bad link due to boldness in `pl.count` (14691)
- Improve docstrings for `ewm_*` and `rolling_*` methods (14667)
- Improve examples for `Series.binary.encode` and `Series.binary.decode`. (14579)
- Add examples for `Series.kurtosis` (14681)
- Fix docstring for `LazyGroupBy.len` (14661)
- Separate "writing a plugin" from "registering an expression" in user guide, add some extra links, don't use deprecated \_register\_plugin (14621)
- Fix code block path for group by example in getting started guide (14612)
- Add missing 'string' column in reading-writing Rust example to match Python example (14597)

๐Ÿ“ฆ Build system

- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)

๐Ÿ› ๏ธ Other improvements

- Limit CMake threads to fix crash compiling `libz-ng-sys` on macOS (14715)
- Fix `make requirements` when conda environment is active (14693)
- update rustc (14678)
- redundant imports all crates (14662)
- Avoid unnecessary cast in Series constructor (14650)
- Add test on selecting Enum columns (14628)
- Use `uv` for `make requirements` (14618)
- Rename coverage file (14607)
- Add a lint-only `Makefile` option (14602)
- No longer use `SeriesView` in `Series.to_numpy` (14588)

Thank you to all our contributors for making this release possible!
Kylea650, MarcoGorelli, Object905, alexander-beedie, bsubei, c-peters, eLVas, itamarst, mbuhidar, mcrumiller, nameexhaustion, orlp, rijkvp, ritchie46 and stinodego


py-0.20.10
โš ๏ธ Deprecations

- Add `allow_copy` parameter to `DataFrame.to_numpy` (14569)

๐Ÿš€ Performance improvements

- Avoid loading pandas in `from_arrow` when array has 0 chunks (14562)

โœจ Enhancements

- Warn on inefficient use of `map_elements` for additional string functions (14565)
- Add `allow_copy` parameter to `DataFrame.to_numpy` (14569)
- Improve `read_database` interop with sqlalchemy `Session` and various `Result` objects (14557)
- Warn on inefficient use of `map_elements` for temporal attributes/methods (14529)

๐Ÿž Bug fixes

- Semi-join and multiple keys outer-join did not respect POLARS\_MAX\_THREADS (14571)
- Correct sorted flag of chunked gather (14570)

๐Ÿ“– Documentation

- Fix typo of "Cartesian" product (14585)
- Mention in contributing guide that PR titles should start with an uppercase letter (14584)
- Fix markdown newline for rendering function description in VSCode (14567)

๐Ÿ› ๏ธ Other improvements

- Refactor code coverage workflow (14563)
- Disable status from code coverage (14545)

Thank you to all our contributors for making this release possible!
CBell045, alexander-beedie, c-peters, nameexhaustion, ritchie46 and stinodego


py-0.20.9
๐Ÿš€ Performance improvements

- use owned arithmetic in horizontal\_sum (14525)

โœจ Enhancements

- Add `writable` flag to `DataFrame.to_numpy` (14520)
- flatten aliases (14512)
- Make formatting more consistent in DOT graphs (14486)
- add `flush` operator to streaming operators (14500)

๐Ÿž Bug fixes

- ensure the streaming dispatcher can replace placeholders in unions (14537)
- Ensure series are contiguous prior to `transpose` (14527)
- write csv header if necessary when finishing sinks (14518)
- fix logical dtypes in take\_chunked (14517)
- fix binary-offset row-encode (14514)
- race conditions in OOC writing (14510)
- Remove `is_numeric` check on `Series.std/var` (14493)
- Error on invalid `schema` input in DataFrame constructor (14483)

๐Ÿ“– Documentation

- Fix docstring example for `Config.save_to_file` (14533)
- Fix `infer_schema_length` param description (14233)
- Clean up grammar and capitalization in `README.md` (14488)
- Add examples for `Series.bin.ends_with`, `Series.bin.starts_with`, `Series.bin.decode`, `Series.bin.encode`. (14478)

๐Ÿ› ๏ธ Other improvements

- add code coverage CI (14532)
- Re-enable streaming OOC tests (14522)
- Use constant for checking Sphinx building (14502)

Thank you to all our contributors for making this release possible!
FBruzzesi, NedJWestern, c-peters, dannyfriar, i-aki-y, jdanford, mbuhidar, mcrumiller, ritchie46, stinodego and taki-mekhalfa


py-0.20.8
๐Ÿ† Highlights

- Implemented tree formatting for LogicalPlan (14221)

โš ๏ธ Deprecations

- Deprecate positional args in `pivot` to prepare new functionality (14428)

๐Ÿš€ Performance improvements

- Combine small chunks in sinks for streaming pipelines (14346)
- reduce heap allocs in expression/logical-plan iteration (14440)
- simplify and speed up cum\_sum and cum\_prod (14409)
- simplify negated predicates to improve row groups skipping (14370)

โœจ Enhancements

- Increase verbosity of duplicate column error message (11899)
- change print to warn in reading csv from python file like object (14469)
- Raise if `pivot` would introduce duplicate column names (14431)
- apply negate in simplify expression pass (14436)
- restrict more cloud interop to semaphore budget (14435)
- Implement `min`/`max` for categorical dtype (14112)
- Hide `polars.testing.*` in pytest stack traces (14399)
- expose numpy view to integer types (14405)
- Allow column name input in `clip` (14410)
- add boolean rle decoding for parquet (14403)
- Allow brackets in SQL join conditions (14263)
- Implemented tree formatting for LogicalPlan (14221)
- Implement `mean_horizontal` expression (14369)
- support decimal comparison (14338)
- Implements `arr.shift` (14298)
- Implements `list.n_unique` (14306)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- unset WRITEABLE flag in zero-copy output (14283)
- Support `Categorical/Enum` in `Series.to_numpy` (14275)
- add parametric testing support for the `Array` dtype (14265)

๐Ÿž Bug fixes

- don't gc after variadic buffers are written (14473)
- Increase verbosity of duplicate column error message (11899)
- Return appropriate data type for duration `mean` and `median` (14376)
- change print to warn in reading csv from python file like object (14469)
- regression in out-of-core group-by by new string-type (14464)
- DataFrame.pivot was returning incorrect results when multiple columns were passed to `index` and one of them was Struct (14438)
- remove literal `Series` from projection state (14437)
- pivot was producing incorrect results when (single) `index` was Struct (14308)
- Error on some invalid `clip` inputs (14416)
- Series.hist panicking on empty/all-null (14407)
- rechunk series when apply\_lambda (14406)
- Raise if invalid strategy is passed to `map_elements` (14397)
- Require exact checking for Decimals in assertion utils (14357)
- fix ufunc for unlimited column args (14328)
- Handle chunked Series in `Series.to_numpy` (14341)
- Remove duplicated content in error messages (8107)
- Fix `set_operation` if the input is sliced and be broadcast (14303)
- Wrap `par_iter` in `list.to_struct` by `POOL.install` (14304)
- Do not panic when casting from an empty Series to pl.Decimal (14330)
- Preserve name when casting to Enum (14320)
- `list.get` does not work on list of decimals (14276)
- relax precision when up scaling (14270)
- Allow format object series with registry (14272)

๐Ÿ“– Documentation

- Update `read_database` docstring note about getting the connection URI string for sqlalchemy (14461)
- Fix typo in plugins section (14402)
- Add debugging section to contributing docs (10576)
- Define what a 'character' means in `slice` / `len_chars` (14395)
- Clarify behavior of `DataFrame.rows_by_key` (14149)
- Fix some typos (14394)
- Realign file structure of user guide (14360)
- Rust examples for data structures in user guide (14339)
- Add deprecation period policy example for post-1.0.0 (14184)
- Add example for `Series.bin.contains` (14297)
- Small clarifications in the contributing guide (14310)
- Fix capitalization of user guide references (14291)
- Fix explode docstring mentioning String types (14285)
- Update deltalake docstrings to new link (14282)

๐Ÿ› ๏ธ Other improvements

- Ignore unclosed file warnings for now (14467)
- Raise better error in import timings test (14441)
- Refactor `arg_min/max` test case (14439)
- Skip some OOC tests that fail randomly in the CI (14434)
- Bump release drafter to v6 (14429)
- Set specific temp dir for OOC tests (14420)
- Bump `setup-graphviz` action to v2 (14418)
- Minor test refactor (14404)
- Update `make clean` command (14408)
- Internal rename of `_or` to `or_` in PyO3 (same for `_xor/_and`) (14393)
- Minor refactor of `DataFrame.to_numpy` structured code (14348)
- Update `Series.to_numpy` to handle Decimal/Time types in Rust (14296)
- Add test for `Series.to_numpy` with timezones (14337)
- Bump ruff version to 0.2.0 (14294)
- Temporarily fix failing deltalake test (14288)
- remove dataframe consortium standard api entrypoint (14279)

Thank you to all our contributors for making this release possible!
BGR360, CaselIT, MarcoGorelli, Migi, NedJWestern, Vincenthays, alexander-beedie, deanm0000, dependabot, dependabot[bot], engdoreis, flisky, grinya007, itamarst, janosh, kalekundert, lukemanley, mbuhidar, mcrumiller, petrosbar, r-brink, rben01, reswqa, ritchie46, stinodego, taki-mekhalfa and thomasfrederikhoeck


py-0.20.7
โš ๏ธ Deprecations

- Rename `threadpool_size` to `thread_pool_size` (14236)

๐Ÿš€ Performance improvements

- prune parquet row groups when `is_not_null` is used (14260)
- Avoid unnecessary copies in `Series.to_numpy` for boolean/temporal types (14261)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)

โœจ Enhancements

- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- move `F-order` data in and out of numpy to polars zero copy (14259)
- read arrow-c-interface without requiring pyarrow (14254)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Change `Series.to_numpy` to return `f64` for `Int32/UInt32` Series with nulls instead of `f32` (14240)
- Polish decimal arithmetic (14172)
- improved `read_excel` format detection, and support for excel 97-2004 workbooks (14234)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- support pd.Index in from\_pandas and elsewhere (14087)
- Allow renaming expressions with keyword syntax in `group_by` (14071)
- raise more informative error message if someone lands on Expr.\_\_bool\_\_ (14067)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)

๐Ÿž Bug fixes

- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Make `Series.to_numpy` on booleans without nulls return `bool` type (14239)
- fix ufunc in agg (change \_\_ufunc\_array\_\_ so it uses `is_elementwise=True` parameter) (14135)
- Fix join validation for String types (14229)
- enable windows test coverage for `read_excel` "calamine" (fastexcel) engine (14171)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- multiple `read_excel` updates (14039)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- respect `Object` dtype designation (14072)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- Allow `Series.to_pandas` for categorical types (14028)
- correct field type schema inference (using read\_csv) (14042)
- Use int formatter for unsigned ints (14043)

๐Ÿ“– Documentation

- fix code block in user-guide/lazy/schemas (14228)
- Add visualization page to user guide (13052)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Document that Kleene logic is followed in `any_horizontal` and `all_horizontal` (14148)
- Fix description of `return_dtype` parameter for `map_elements` and `map_batches` (14114)
- Fix bullet point formatting in CI contributing guide (14117)
- Add documentation on replacement strings to `str.replace` and `str.replace_all` (13382)
- Replace alternatives page with more objective comparison (13784)
- Note that only one `name` operation is allowed per expression (14075)
- Improve deprecation message of `dtype_if_empty` param (14068)
- fix more docstring bullet points (14065)

๐Ÿ› ๏ธ Other improvements

- Reorganize NumPy interop tests (14257)
- additional dataframe test coverage (14243)
- Remove `*args` in `Series.to_numpy` (14248)
- Move metadata utils to `meta` module (14230)
- remove unused method DataFrame.\_from\_dicts (14212)
- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Issue a warning when running doctests on Python 3.11 or lower (14187)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- remove unused argument (14014)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Vincenthays, Wainberg, alexander-beedie, apcamargo, braaannigan, c-peters, deanm0000, dependabot, dependabot[bot], dpinol, edavisau, eitsupi, flisky, grinya007, ion-elgreco, itamarst, lukemanley, mcrumiller, orlp, r-brink, reswqa, ritchie46, stinodego and taki-mekhalfa


rs-0.37.0
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

๐Ÿ’ฅ Breaking changes

- Remove `DatetimeChunked::convert_time_zone` (14046)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)
- Rename `drop_columns` to `drop` (13754)
- Rename `pl.count()` to `pl.len()` (13719)
- Rename `row_count_name`/`row_count_offset` parameters in IO functions to `row_index_*` (13563)
- Rename `with_row_count` to `with_row_index` (13494)

๐Ÿš€ Performance improvements

- prune parquet row groups when `is_not_null` is used (14260)
- use is\_between to skip parquet row groups (14244)
- Use a compression API that is designed for this use case (11699) (14194)
- Use `UnitVec` in polars-plan traversal (14199)
- use `UnitVec` in streaming joins (14197)
- improve `ChunkId` (14175)
- improve iteration performance (14126)
- elide unneeded work in window? (14108)
- run window functions more in parallel (14095)
- improve skip row group using statistics condition (14056)
- improve string/binary reverse performance (14016)
- optimize `DataFrame.describe` by presorting columns (13822)
- elide redundant bound checks. (13909)
- speedup boolean filter (13905)
- speedup binview filter (13902)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)
- directly embed data ptr in Buffer (13744)
- elide parallelism restriction on generic rolling expressions (13662)
- ensure time groups are parallelized (13660)
- do not eagerly compute bitcount (13562)
- optimise SQL engine string concat (13499)
- remove lifetime requirement from CategoricalChunkedBuilder (13319)

โœจ Enhancements

- add `u8`/`i8`/`u16`/`i16` parsers to CSV reader (14241)
- Implements `list.gather_every` (14253)
- Implements `prefix/suffix_fields` (14251)
- Polish decimal arithmetic (14172)
- Introduce `arr.to_struct` (14202)
- Supports map fields name of struct (14203)
- make `IdxVec` generic as `UnitVec` (14196)
- add new arithmetic kernels (14026)
- Supports `unique` and `hash_rows` for `null` column (14111)
- Implement arithmetic operations for `Null` columns (14107)
- Add strict/non-strict construction of Boolean/Binary series (14073)
- Improve `Series::from_any_values` logic (14052)
- Adapt extend\_constant to function expr architecture and expressify it (14058)
- add integer negation (14049)
- `list` \& `array` measures of dispersion (13245)
- gc binview when writing ipc (14035)
- When calling `convert_time_zone` on time-zone-naive datetime, convert as if converting from UTC (13960)
- DataFrame supports explode by array column (13958)
- improve binary formatting (13981)
- preserve Enum information when going to IPC (13943)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (13944)
- support cast decimal to utf8 (13829)
- add SQL support for `timestamp` precision modifier (13936)
- support negative indexing and expressions for `LEFT`, `RIGHT` and `SUBSTR` SQL string funcs (13888)
- Introduce `explode` for `ArrayNameSpace` (13923)
- raise better error message for .dt.time on Date column (13932)
- List set\_operations supports float (13920)
- Add `ignore_nulls` for `arr.join` (13919)
- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add `nulls_last` for `Series.sort` (13794)
- Impl `count_matches` for array namespace (13675)
- Add `nulls_last` for `list/array.sort` (13795)
- Rename `drop_columns` to `drop` (13754)
- convert fixed-offset timezones to respective Etc timezone from time zone database (13738)
- Expressify `str.slice` (13747)
- implement binview for polars-row (13736)
- implement binview for polars-json (13737)
- add architecture for polars-flavored IPC (13734)
- implement binview comparison kernels (13715)
- raise default frame/series repr height from 8 to 10 (13699)
- write parquet ColumnOrder (13672)
- Impl `contains` for ArrayNameSpace (13638)
- improve `rolling()` expression formatting (13657)
- Implement `is_between` in Rust (11945)
- Expressify `pattern` of `str.extract` (13607)
- Impl `join` for ArrayNameSpace (13586)
- add SQL engine support for string cast to `json` (13624)
- add SQL engine support for `EXTRACT` and `DATE_PART` (13603)
- add `BinaryView` to `parquet` writer/reader. (13489)
- add SQL engine support for `POSITION` and `STRPOS` (13585)
- `is_in` support for array dtype (13559)
- add new `str.find` expression, returning the index of a regex pattern or literal substring (13561)
- add SQL engine support for `LIKE` and `ILIKE` pattern matching (13522)
- improve hive partition pruning (13358) (13426)
- don't rechunk by default in lazy scans (13518)
- Add `cum_count` expression function (13478)
- add SQL engine support for `IF` control flow function (13491)
- add SQL engine support for `MOD` function (13502)
- return datetime for datetime mean \& median (13417)
- add SQL engine support for `CONCAT_WS` string function (13483)
- `BinaryView`/`Utf8View` IPC support (13464)
- Implement wasm Pool::scope (13476)
- add SQL engine support for `RIGHT` and `REVERSE` string functions (13461)
- implement `BinaryView` and `Utf8View` in `polars-arrow` (13243)
- add SQL engine support for variadic string `CONCAT` function (13428)
- add support for AND in SQL join-clause context (13242)
- Impl ordering ops for array namespace (13414)
- add SQL engine support for `REPLACE` string function (13431)
- add SQL engine support for `SIGN` function (13429)
- add SQL engine support for `IFNULL` function (13432)
- additional SQL support for `bytes`, `bit`, and `hex` literals (13389)

๐Ÿž Bug fixes

- deduplicate recursive growables (14264)
- Fix `glimpse` overload signature (14258)
- allow set operations on list of categoricals (14110)
- `any/all_horizontal` with single input has incorrect type (14256)
- load numpy array with np array values 14237 (14238)
- Fix join validation for String types (14229)
- make csv parser more robust to edge cases (14210)
- Fix for `set_operations` of binary dtype (14152)
- fix read\_csv date/datetime inference and parsing (14113)
- don't see files as hive partitions (14128)
- allow eval on list of categoricals (14132)
- add missing conditional compile flag for `StringFunction::Find` (14129)
- Forbid casting from `Date` to `Time` and vice versa (14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (14120)
- Implements `gt/lt` cmp for null dtype (14119)
- ignore comments at beginning of csv if schema provided (14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot\_table would do (14048)
- some temporal conversion errors for datetimes earlier than `1970-01-01` (14050)
- Preserve name when casting from categorical (14085)
- fix cse bug when window function is nested (14070)
- Fix `melt` panic when there are no value vars (14057)
- `json_encode` should respect the logical type (14063)
- improve skip row group using statistics condition (14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (13962)
- handle `SliceSink` with empty data (14025)
- correct field type schema inference (using read\_csv) (14042)
- Map `AnyValue::Null` to datatype `Null` (14045)
- Use int formatter for unsigned ints (14043)
- quick fix for multiple chunks binary reverse (14024)
- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)
- allow get access to list of categoricals (14015)
- Fix casting from categorical to numeric (13957)
- read\_csv preserve whitespace and newlines (13934)
- append decimal with different scale (13977)
- Allow casting integer types to Enum (13955)
- `arg_min/max` on categoricals should respect ordering (13998)
- serialize decimal type (13997)
- check input type for `arr/list.contains` (13959)
- Allow dtype merge when inner dtype is enum (13938)
- recurse less in streaming shared sinks (13930)
- ensure order is preserved if streaming from different sources (13922)
- Fix `is_not_null` for Struct columns (13921)
- make 100 \* pl.col(pl.Boolean).mean() work (13725)
- allow extract of numeric from str AnyValue (13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (13808)
- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- don't return NaN as free memory fraction (13860)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- Correct err message of `check_map_output_len` (13854)
- allow list creation of decimals (13851)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- decompress the right number of rows when reading compressed CSVs (13721)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- When reading Parquet or Arrow, convert +00:00 timezone to UTC (13816)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- do not read data for zero-length compressed buffer (13791)
- Fix the non-null test of `transpose` (13783)
- Raise error instead of panic when joining on wildcard/nth (13742)
- `str.concat` correctly ignore single null value (13751)
- Selectors `by_name` and `by_dtype` should allow empty list as input (11024)
- Use `NonZeroUsize` for `batch_size` parameter in `write_csv/sink_csv/scan_ndjson` (13726)
- error instead of panicking in sql if empty function (13691)
- gather.get schema (13679)
- ensure we hit proper cache in nested `rolling` expressions (13666)
- Allow `av_buffer` cast numeric record to temporal type (13661)
- streaming cross join if swapped is hit (13656)
- Make sure rolling key is projected when process projection (13622)
- fix schema inference for json (13637)
- Empty series of AggregatedList should also have list dtype (13620)
- fallback to cast kernel if `inline_cast` AnyValue raise (13595)
- `LazyFrame::join()` no longer ignores 3 `JoinArgs` parameters (13570)
- fix reverse variable row decoding (13587)
- Fix `scatter` for null values (13578)
- Fix `cum_count` with regards to start value / null values (13535)
- Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (13548)
- Treat Python `None` as null value for `Object` dtype (13564)
- `Expr.replace` to single value did not replace NULLs (13551)
- `AnyValue::StructOwned` panic when hashing (13553)
- improve hive partition pruning (13358) (13426)
- fix projection pushdown for new outer join schema (13527)
- ensure size-hint of TrueIdxIter is correct (13508)
- correct 'outer\_coalesce' logic in case of duplicate names (13501)
- raise for out-of-range datetimes in to\_datetime/strptime (13403)
- Keep logical type when getting values from list (13456)
- Handle duplicate/ambiguous inputs for `replace` (13217)
- skip null/empty values if replace\_lit\_n\_char (13400)
- fix is\_in operator when comparing string with global categoricals (13412)
- use different generics for `shift_and_fill` parameters (13379)

๐Ÿ“– Documentation

- fix code block in user-guide/lazy/schemas (14228)
- Fix typo in contributing guide (14181)
- Small improvements Ecosystem page (14176)
- fix code blocks in user-guide/concepts/data-structures (14146)
- Fix bullet point formatting in CI contributing guide (14117)
- Remove outdated reference to horizontal concat feature (14105)
- Replace alternatives page with more objective comparison (13784)
- Improve structure of user guide (13951)
- Improve structure of user guide (13639)
- Introduce ecosystem page in user guide (13903)
- Mention deltalake write support in README (13890)
- Fix typo in deprecation message of `with_row_count` (13793)
- Fix incorrect "coming from pandas" syntax (13767)
- Improve streaming section of the user guide (13750)
- fix linking to feature flags in user guide (13644)
- Improve documentation on broadcasting (13394)
- Add note about toolchain issue under native Windows (13590)
- update SQL section of the README (13529)
- update polars-business > polars-xdt link (13509)

๐Ÿ“ฆ Build system

- Enable feature nightly with optional sql feature (14222)
- remove horizontal\_concat feature (13390)

๐Ÿ› ๏ธ Other improvements

- make gather\_chunked completely generic (14195)
- Add `.cargo` directory to .gitignore (14191)
- `take_chunked` to polars-ops (14185)
- Enable `clippy` lint to warn on debug macros (14178)
- Run `cargo update` (14160)
- merge take kernels (14137)
- improve From\<Ca> -> Vec (14123)
- hoist boolean -> string cast (14122)
- Remove `DatetimeChunked::convert_time_zone` (14046)
- More generic way to present an expression tree diagram (14020)
- Rename `LiteralValue::to_anyvalue` to `LiteralValue::to_any_value` (14033)
- make Enums an actual datatype (14011)
- update rustc (13947)
- move `filter` to `polars-compute` (13897)
- bump object\_store to 0.9 (13857)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)
- Make `pl.duration` non-anonymous (13762)
- Rename `pl.count()` to `pl.len()` (13719)
- Deprecate `dt.with_time_unit` in favor of `cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))` (13667)
- Auto-add 'needs triage' label to bugs (13671)
- make rolling index column visible to optimizer (13658)
- Rename `lazy-regex` feature to `regex` to align `polars` with `polars-lazy` crate (13647)
- Add `Documentation` / `Build system` sections to the changelog (13594)
- Filter unhelpful messages in `make build` (13579)
- Remove extra line break between checkboxes in GitHub bug report issues (13576)
- Rename `row_count_name`/`row_count_offset` parameters in IO functions to `row_index_*` (13563)
- Rename `with_row_count` to `with_row_index` (13494)
- simplify parquet binary ordering function (13488)
- dont panic of ambiguous is of wrong type (13388)

Thank you to all our contributors for making this release possible!
29antonioac, Bromeon, ByteNybbler, JulianCologne, MarcNuebel, MarcoGorelli, NedJWestern, ShivMunagala, Vincenthays, Wainberg, aaarrti, alexander-beedie, apcamargo, bchalk101, braaannigan, c-peters, cgevans, cmdlineluser, collinprince, deanm0000, dependabot, dependabot[bot], dpinol, edavisau, eitsupi, flisky, grinya007, hamishs, henryharbeck, ion-elgreco, itamarst, jacksonthall22, jcrozum, kstoneriv3, langestefan, lukemanley, mcrumiller, mkucijan, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, s-banach, shritesh, stinodego, taki-mekhalfa, thomasaarholt, tim-stephenson, universalmind303, valorien and wjandrea


py-0.20.6
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

โš ๏ธ Deprecations

- Deprecate `dtype_if_empty` parameter for `Series` constructor (13976)

๐Ÿš€ Performance improvements

- improve string/binary reverse performance (14016)
- add "calamine" support to `read_excel`, using `fastexcel` (~8-10x speedup) (14000)
- optimize `DataFrame.describe` by presorting columns (13822)
- elide redundant bound checks. (13909)
- speedup boolean filter (13905)
- speedup binview filter (13902)
- allow python threads in read\_ functions (13886)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)

โœจ Enhancements

- Add `UnstableWarning` for unstable functionality (13948)
- DataFrame supports explode by array column (13958)
- add "calamine" support to `read_excel`, using `fastexcel` (~8-10x speedup) (14000)
- improve binary formatting (13981)
- preserve Enum information when going to IPC (13943)
- support calling `describe` on a `LazyFrame` (13982)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (13944)
- support cast decimal to utf8 (13829)
- add SQL support for `timestamp` precision modifier (13936)
- support negative indexing and expressions for `LEFT`, `RIGHT` and `SUBSTR` SQL string funcs (13888)
- Introduce `explode` for `ArrayNameSpace` (13923)
- unify Series/DataFrame `describe` code (13720)
- raise better error message for .dt.time on Date column (13932)
- List set\_operations supports float (13920)
- Add `ignore_nulls` for `arr.join` (13919)
- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- Align `int_range` and `int_ranges` signatures (13867)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- allow df.rename and lf.rename to take a renaming function (13708)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add typing to hvplot plot namespace (13813)
- Add `nulls_last` for `Series.sort` (13794)
- allow `ftp` URLs, improve URL check (13781)

๐Ÿž Bug fixes

- count matches on list categorical (14021)
- `list.min/max` with empty and/or None elements (14018)
- Make `to_pandas()` work for Dataframe and Series with dtype `Object` (13910)
- raise for `pl.concat(how="align")` when no columns are shared between frames (13941)
- Fix casting from categorical to numeric (13957)
- read\_csv preserve whitespace and newlines (13934)
- omit implicit 'site' from import-timing test (14009)
- append decimal with different scale (13977)
- Use `date_as_object=False` as default for `Series.to_pandas` (just like `DataFrame.to_pandas`) (13984)
- serialize decimal type (13997)
- check input type for `arr/list.contains` (13959)
- Fix `max_colname_length` formatting in `glimpse()` (13969)
- Allow dtype merge when inner dtype is enum (13938)
- recurse less in streaming shared sinks (13930)
- ensure order is preserved if streaming from different sources (13922)
- Fix `is_not_null` for Struct columns (13921)
- convert object-dtyped NumPy str/bytes arrays to pl.String/pl.Binary instead of pl.Object (13712)
- allow extract of numeric from str AnyValue (13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (13808)
- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- Fix interchange protocol for new String type (13881)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- allow list creation of decimals (13851)
- ensure kwargs `filter` behaviour matches docstring (expect equivalence with `eq`) (13864)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- validate operator arithmetic with `None`, fix `Series` edge-case (13780)

๐Ÿ“– Documentation

- Add missing doc entries (14006)
- add missing len to rst file (13999)
- Improve structure of user guide (13951)
- Improve structure of user guide (13639)
- Introduce ecosystem page in user guide (13903)
- Mention deltalake write support in README (13890)
- use proper argument names in the code blocks of api.rst (13866)

๐Ÿ› ๏ธ Other improvements

- make Enums an actual datatype (14011)
- omit implicit 'site' from import-timing test (14009)
- Constructor improvements - part 1 (14001)
- Add `glimpse` test (13979)
- Move PyO3 ChunkedArray conversion logic into its own module (13973)
- Fix xdist streaming group (13974)
- Fix spurious test failures (13961)
- minor `describe` tidy-up, and slight rewording of some Exception docstrings (13942)
- Fix pip warning filter return code (13935)
- Minor refactor of PyO3 conversions module (13929)
- move `filter` to `polars-compute` (13897)
- Revert pandas warning filter (13893)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)

Thank you to all our contributors for making this release possible!
ByteNybbler, JulianCologne, MarcoGorelli, Wainberg, alexander-beedie, c-peters, dependabot, dependabot[bot], edavisau, flisky, ion-elgreco, itamarst, jacksonthall22, kstoneriv3, mcrumiller, mkucijan, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, stinodego, taki-mekhalfa, thomasaarholt and valorien


py-0.20.6-rc.1
๐Ÿ† Highlights

- new implementation for `String/Binary` type. (13748)

๐Ÿš€ Performance improvements

- speedup boolean filter (13905)
- speedup binview filter (13902)
- allow python threads in read\_ functions (13886)
- improve binview filter (13878)
- apply string view GC more conservatively (13850)
- add optimized BinaryViewArray comparison kernels (13839)
- lazy cache binview bytes len (13830)
- fast-path for eager int\_range (13811)
- Optimize `arr.sum` for inner non-null bool (13800)

โœจ Enhancements

- register 'set\_sorted' as batch/elementwise (13896)
- move Enum/Categorical categories to binview (13882)
- Add `ignore_nulls` for `list.join` (13701)
- Add `ignore_nulls` for `pl.concat_str` (13877)
- Align `int_range` and `int_ranges` signatures (13867)
- fix parquet for binview (13873)
- support mmap for binview in OOC (13872)
- implement ffi for `binview` (13871)
- Support zero fill null strategy for binary and string columns (13869)
- allow df.rename and lf.rename to take a renaming function (13708)
- Implement/fix unary minus operator `-pl.col(...)` (13776)
- extend SQL `EXTRACT` with "century", "millennium", and "timezone" parts (13634)
- fix binview ipc format (13842)
- add SQL support for `numeric` and/or `decimal` types (13739)
- improve panic message (13836)
- Expressify `str.zfill` (13790)
- new implementation for `String/Binary` type. (13748)
- Add typing to hvplot plot namespace (13813)
- Add `nulls_last` for `Series.sort` (13794)
- allow `ftp` URLs, improve URL check (13781)

๐Ÿž Bug fixes

- prune emtpy chunks before set operations (13898)
- treat null columns as zero in `sum_horizontal` (13880)
- include null count in rolling window validity with `min_periods` (13863)
- Fix interchange protocol for new String type (13881)
- parquet hybrid RLE encoding did not always align to bit width (13883)
- Add `ignore_nulls` for `list.join` (13701)
- .dt.time() was panicking for datetimes prior to unix epoch (13812)
- allow list creation of decimals (13851)
- ensure kwargs `filter` behaviour matches docstring (expect equivalence with `eq`) (13864)
- Implement `abs` for Decimal, error on Date/Time/Datetime (13821)
- rolling nested groups deadlock (13835)
- `gather_every` should work on agg context (13810)
- Fix segfault of `is_in` (13814)
- don't panic on full null qcut (13815)
- validate operator arithmetic with `None`, fix `Series` edge-case (13780)

๐Ÿ“– Documentation

- Mention deltalake write support in README (13890)
- use proper argument names in the code blocks of api.rst (13866)

๐Ÿ› ๏ธ Other improvements

- move `filter` to `polars-compute` (13897)
- Revert pandas warning filter (13893)
- Make functions in `expr/general` non-anonymous (13832)
- Fix doctests (13831)
- Refactor Python release workflow (13807)

Thank you to all our contributors for making this release possible!
ByteNybbler, MarcoGorelli, Wainberg, alexander-beedie, dependabot, dependabot[bot], edavisau, flisky, ion-elgreco, itamarst, kstoneriv3, mcrumiller, mkucijan, nameexhaustion, orlp, reswqa, ritchie46, stinodego, taki-mekhalfa and thomasaarholt


py-0.20.5
โš ๏ธ Deprecations

- Deprecate default delimiter value for `str.concat` (13690)
- Rename `pl.count()` to `pl.len()` (13719)
- Deprecate `dt.with_time_unit` in favor of `cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))` (13667)

๐Ÿš€ Performance improvements

- directly embed data ptr in Buffer (13744)

โœจ Enhancements

- Impl `count_matches` for array namespace (13675)
- Add `nulls_last` for `list/array.sort` (13795)
- convert fixed-offset timezones to respective Etc timezone from time zone database (13738)
- allow `read_excel` to load from remote http locations (13753)
- Expressify `str.slice` (13747)
- implement binview for polars-row (13736)
- implement binview for polars-json (13737)
- add architecture for polars-flavored IPC (13734)
- implement binview comparison kernels (13715)
- raise default frame/series repr height from 8 to 10 (13699)

๐Ÿž Bug fixes

- do not read data for zero-length compressed buffer (13791)
- Fix the non-null test of `transpose` (13783)
- Raise error instead of panic when joining on wildcard/nth (13742)
- `str.concat` correctly ignore single null value (13751)
- Selectors `by_name` and `by_dtype` should allow empty list as input (11024)
- Keep Series attributes docstrings when read by Sphinx (13731)
- fix error message when creating DataFrame from 0-dimensional NumPy array (13729)
- support corr() for single-column DataFrames (13728)
- Use `NonZeroUsize` for `batch_size` parameter in `write_csv/sink_csv/scan_ndjson` (13726)
- error instead of panicking in sql if empty function (13691)

๐Ÿ“– Documentation

- Fix typo in deprecation message of `with_row_count` (13793)
- Fix incorrect "coming from pandas" syntax (13767)
- Improve streaming section of the user guide (13750)
- improve `n_unique` and `approx_n_unique` docs (13752)
- add missing Series.str.find reference (13717)
- Be more explicit about behaviour in `str.strip_chars` / `strip_chars_start` / `strip_chars_end` docstrings (13697)
- Add doc example for `datetime_ranges` (13695)
- document %A and %B to get day name and month name (13678)

๐Ÿ› ๏ธ Other improvements

- Make `pl.duration` non-anonymous (13762)
- Add test for `describe` on Object types (13689)
- Only run bytecode parser CI workflow for Python 3.9/3.10 (13664)

Thank you to all our contributors for making this release possible!
29antonioac, MarcoGorelli, NedJWestern, Wainberg, alexander-beedie, cgevans, henryharbeck, langestefan, orlp, petrosbar, r-brink, reswqa, ritchie46, stinodego and universalmind303


py-0.20.4
โš ๏ธ Deprecations

- Fix group keys in `partition_by(as_dict=True)` / `GroupBy.__iter__` in some cases (13646)
- Rename `row_count_name`/`row_count_offset` parameters in IO functions to `row_index_*` (13563)
- Deprecate `dt.datetime` in favor of `dt.replace_time_zone(None)` (13520)
- Rename `with_row_count` to `with_row_index` (13494)
- Deprecate `Expr.where` in favor of `filter` (13440)
- Allow `drop` with no inputs as a no-op (13460)

๐Ÿš€ Performance improvements

- elide parallelism restriction on generic rolling expressions (13662)
- ensure time groups are parallelized (13660)
- do not eagerly compute bitcount (13562)
- optimise SQL engine string concat (13499)
- Refactor expression parsing logic of predicates/constraints (13468)
- Represent `Enum` categories as Series (13434)
- remove lifetime requirement from CategoricalChunkedBuilder (13319)

โœจ Enhancements

- write parquet ColumnOrder (13672)
- Impl `contains` for ArrayNameSpace (13638)
- improve `rolling()` expression formatting (13657)
- Implement `is_between` in Rust (11945)
- Add base `PolarsError` and `PolarsWarning` class (13615)
- typing overloads for Series operator methods `ge, gt, ...` (13167)
- Expressify `pattern` of `str.extract` (13607)
- Impl `join` for ArrayNameSpace (13586)
- add SQL engine support for string cast to `json` (13624)
- add SQL engine support for `EXTRACT` and `DATE_PART` (13603)
- Allow `drop` with no inputs as a no-op (13460)
- add SQL engine support for `POSITION` and `STRPOS` (13585)
- additional multi-column support for `pl.<function>` entries (13336)
- `is_in` support for array dtype (13559)
- add new `str.find` expression, returning the index of a regex pattern or literal substring (13561)
- Impl and dispatch arr.first/last to get (13536)
- Implement `from_dataframe` natively (interchange protocol) (10701)
- add SQL engine support for `LIKE` and `ILIKE` pattern matching (13522)
- improve hive partition pruning (13358) (13426)
- Add compact syntax for `int_range` starting from 0 (13530)
- don't rechunk by default in lazy scans (13518)
- Add `cum_count` expression function (13478)
- add SQL engine support for `IF` control flow function (13491)
- add SQL engine support for `MOD` function (13502)
- return datetime for datetime mean \& median (13417)
- add SQL engine support for `CONCAT_WS` string function (13483)
- Allow map\_batches to auto-convert output NumPy arrays to Series (13277)
- add SQL engine support for `RIGHT` and `REVERSE` string functions (13461)
- implement `BinaryView` and `Utf8View` in `polars-arrow` (13243)
- add SQL engine support for variadic string `CONCAT` function (13428)
- add support for AND in SQL join-clause context (13242)
- Impl ordering ops for array namespace (13414)
- add SQL engine support for `REPLACE` string function (13431)
- add SQL engine support for `SIGN` function (13429)
- add SQL engine support for `IFNULL` function (13432)
- additional SQL support for `bytes`, `bit`, and `hex` literals (13389)

๐Ÿž Bug fixes

- gather.get schema (13679)
- Fix group keys in `partition_by(as_dict=True)` / `GroupBy.__iter__` in some cases (13646)
- ensure we hit proper cache in nested `rolling` expressions (13666)
- Allow `av_buffer` cast numeric record to temporal type (13661)
- streaming cross join if swapped is hit (13656)
- Make sure rolling key is projected when process projection (13622)
- fix schema inference for json (13637)
- Improve parsing of inputs for Expr dunders (13635)
- Empty series of AggregatedList should also have list dtype (13620)
- `Series.eq_missing` should return an Expr when the input is an Expr (13628)
- fallback to cast kernel if `inline_cast` AnyValue raise (13595)
- Fix formatting in `describe` for precise quantiles (13593)
- fix reverse variable row decoding (13587)
- Fix `scatter` for null values (13578)
- Fix `cum_count` with regards to start value / null values (13535)
- Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (13548)
- Treat Python `None` as null value for `Object` dtype (13564)
- Fix `scatter` to allow single temporal inputs (13577)
- Fix interchange protocol data buffer dtype (10787)
- `Expr.replace` to single value did not replace NULLs (13551)
- improve hive partition pruning (13358) (13426)
- fix projection pushdown for new outer join schema (13527)
- dont raise when partial function is passed to map\_elements (13524)
- improve reading of mixed string/other dtype column data from spreadsheets with `openpyxl` and `pyxlsb` engines (13495)
- ensure size-hint of TrueIdxIter is correct (13508)
- correct 'outer\_coalesce' logic in case of duplicate names (13501)
- raise for out-of-range datetimes in to\_datetime/strptime (13403)
- Fix Series equality for List/Array types (13477)
- Keep logical type when getting values from list (13456)
- Handle duplicate/ambiguous inputs for `replace` (13217)
- Handle empty inputs to Enum constructor (13446)
- Fix `group_by` iteration when grouping by certain selectors (13437)
- Fix `to_pandas` for 0x0 dataframe (13420)
- Fix offsets for numeric types in `from_buffer` (13398)

๐Ÿ“– Documentation

- Clarify documentation for the `agg_list` argument in `Expr.map_batches` (13625)
- fix linking to feature flags in user guide (13644)
- bring sink\_ndjson docstring in line with other sink docstrings (13636)
- Update `then` and `otherwise` docstrings with "strings are parsed as column names" (13630)
- Add `sink_ndjson` to API reference. (13627)
- Improve documentation on broadcasting (13394)
- Add note about toolchain issue under native Windows (13590)
- Hint about ruff setting in VSCode (13421)
- Clarify examples for .transpose() (13581)
- Add additional `Series` docstring examples (13558)
- Doc example for `read_csv` (13161) (13545)
- Add more doc examples on how to create an index column (13532)
- update SQL section of the README (13529)
- Add note to `int_range` docs for creating an index column (13516)
- add a note to the `read_database_uri` docstring about escaping special characters in the connection string (13514)
- update polars-business > polars-xdt link (13509)
- Fix various typos, grammar and formatting in docstrings and user guide (13506)
- Doc examples for `threadpool_size` and `get_index_type` (13496)
- Add missing datetime examples to docs (13487)
- add polars-distance to plugins page (13454)
- define file-like object in read\_parquet docstring (13463)
- Move `Series.struct.json_encode` to methods in Sphinx autosummary (13443)
- Add missing examples of `series/list.py` (13423)
- show `datetime.date` import in code block (13419)
- clarify documentation for rle and rle\_id (13397)
- use named series in Series.plot example (13407)
- fix alphabetical order of documentation entries (13396)

๐Ÿ› ๏ธ Other improvements

- Auto-add 'needs triage' label to bugs (13671)
- make rolling index column visible to optimizer (13658)
- Enable new error message lint to improve stack trace display (13596)
- Add `Documentation` / `Build system` sections to the changelog (13594)
- Filter unhelpful messages in `make build` (13579)
- Remove extra line break between checkboxes in GitHub bug report issues (13576)
- Narrow type hint for `get_index_type` util (13556)
- Fix some test failures/slowdowns (13504)
- pandas 2.2 compat (13467)
- Increase timeout for gevent async test (13448)
- Do not end docstrings with a blank line (13193)

Thank you to all our contributors for making this release possible!
Bromeon, MarcNuebel, MarcoGorelli, ShivMunagala, Wainberg, aaarrti, alexander-beedie, bchalk101, c-peters, cgevans, cmdlineluser, collinprince, deanm0000, hamishs, henryharbeck, ion-elgreco, jcrozum, mcrumiller, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, s-banach, shritesh, stinodego, tim-stephenson and wjandrea


rs-0.36.2
๐Ÿ† Highlights

- Add new `Enum` categorical data type which allows a fixed set of categories (11822)

๐Ÿ’ฅ Breaking changes

- Rename `Utf8` data type to `String` (13224)
- Rename `set_at_idx` to `scatter` (12687)
- Preserve left and right join keys in outer joins (12963)
- Implement `dtype` parameter for `int_range` on Rust side (12940)
- Update `Expr.count` to ignore null values by default (12934)
- Change `value_counts` resulting column name from `counts` to `count` (12506)
- Change default `join` behavior with regard to nulls, add `join_nulls` parameter to keep existing behavior (12840)
- Smaller integer data types for datetime components (12070)
- Fix `NaN` ordering to make NaNs compare greater than any other float, and equal to themselves (12721)
- Rename `frame_equal`/`series_equal` to `equals` (12663)
- Rename `not_` expression to `not` on the Rust side (12587)
- Rename `str.json_extract` to `str.json_decode` (12586)
- Rename DataFrame column index methods (12542)

๐Ÿš€ Performance improvements

- optimize set bit count (13317)
- speed up `.dt.truncate` for large numbers of years (13310)
- don't eagerly evaluate error branches (13311)
- don't needlessly allocate validity in concat/rechunk (13288)
- add fast path to `count_bits_set_by_offsets` (13253)
- make `.dt.truncate('*mo')` more than 3x faster (13192)
- ensure single expression evaluation for replace (13147)
- Elide allocation in outer join materialization (12992)
- Ensure we reduce for `any/all_horizontal` (12976)
- Add fast paths for UTC in `truncate` (12965)
- Improve `rolling_median` algorithm (12704)
- Use fast path for non-null data in new SQL-like null matching (12874)
- improve `merge_local_rhs_categorical` traversal (12660)
- make values\_size estimate correct for sliced arrays (12658)
- improve parquet utf8 validation (12655)
- parquet pre-allocate buffer in binary plain encode (12652)
- optimize dict binary decoding in parquet (12648)
- ensure we only check the values within bounds (12633)
- parquet; elide recursion in hot path (12625)
- improve cov/corr algorithm (12590)
- apply left side predicate pushdown also to right side on semi join (12565)
- ensure streaming parquet download remains concurrent `~7x` (12552)
- speed up parquet download of streaming engine (12544)

โœจ Enhancements

- support negative indices in `gather` in `group_by` context (13373)
- support negative indexing in gather (select context) (13343)
- support min\_periods for temporal rolling aggregations (13342)
- support `REGEXP` and `RLIKE` pattern matching in SQL engine (13359)
- gracefully handle panics in plugins (13329)
- Implement `unique/n_unique/unique_counts/is_unique/is_duplicated` for `Null` series (13307)
- support common variant spelling `STDEV` in the SQL engine (in addition to `STDDEV`) (13303)
- change doc links to new url docs.pola.rs (13290)
- support horizontal concatenation of LazyFrames (13139)
- Impl serde for array dtype (13168)
- dispatch strict\_cast via cast (13255)
- Impl any/all for array type (13250)
- add cancellable queries (13178)
- add `offset` parameter to `gather_every` (13156)
- Support `Array` dtype AnyValue Series construction (12817)
- Allow `step` parameter in `int_ranges` to take an expression (13148)
- Implement `count` for DataFrame/LazyFrame (13153)
- Move from GA to more privacy friendly framework (13155)
- Rename `set_at_idx` to `scatter` (12687)
- prune all/any\_horizontals with single inputs (13146)
- ensure we get cleaner logical plans with `any/all_horizontal` (13144)
- Add `str.contains_any` and `str.replace_many` (Aho-Corasick algorithms) (13073)
- Auto-infer credentials from `.aws` folder (13062)
- Support private cloud S3 storage in `scan_parquet` (13060)
- Allow order operators (\<,>,>=,\<=) on Enum types (12982)
- Reimplement `replace` expression on the Rust side (13002)
- Use tokio semaphore for concurrency handling (13026)
- Improve and expressify `hist` (13014)
- Preserve left and right join keys in outer joins (12963)
- Allow `end` before `start` in `date/time_range` (12964)
- Implement group-tuples for `Null` dtype (12975)
- Implement `dtype` parameter for `int_range` on Rust side (12940)
- Cast to an enum from int (12954)
- Move categorical ordering into dtype (12911)
- Update `Expr.count` to ignore null values by default (12934)
- Enable partial predicate pushdown past window expressions (12710)
- Add `str.reverse` (12878)
- Change `value_counts` resulting column name from `counts` to `count` (12506)
- Implement `std` and `var` for `Duration` columns (12865)
- Change default `join` behavior with regard to nulls, add `join_nulls` parameter to keep existing behavior (12840)
- Preserve base dtype when raising to `UInt` power (10446)
- Smaller integer data types for datetime components (12070)
- Support SQL subqueries for `JOIN` and `FROM` (12819)
- parquet support required deltabyte encoding (12836)
- Add new `Enum` categorical data type which allows a fixed set of categories (11822)
- support nested null in vstack/append/extend/concat (12771)
- Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (12421)
- determine mode parallelism depending on current tasks (12764)
- enable slice push down past `with_columns` (12742)
- implement From\<LazyGroupBy> for LazyFrame (12562)
- Rename `frame_equal`/`series_equal` to `equals` (12663)
- Join operations on local categoricals (12657)
- use RLE\_DICTIONARY for integers in parquet (12647)
- Add configuration option for where Polars spills to disk (12595)
- implement RLE\_DICT encoding for utf8/binary columns (reduced parquet file size) (12623)
- implement 'DeltaByteArray' decoding for parquet (12602)
- warn if `by` column is not sorted in rolling aggregations (as opposed to raising), add warn\_if\_unsorted argument (12398)
- struct -> json encoding expression (12583)
- Implement support for multi-character comments in `read_csv` (12519)
- Implement `LazyFrame.sink_ndjson` (10786)
- improve concurrency parameters (12567)
- Adds sink\_ipc\_cloud (12556)
- Adds sink\_ipc\_cloud (11008)
- In explain(), rename PIPELINE to STREAMING so it's clearer what it means (12547)

๐Ÿž Bug fixes

- range/ranges output name should follow lhs rule (13369)
- updated Display trait for enum categoricals (13331)
- nested dtypes: export logical type in plugins (13325)
- fix invalid dtype setting in array (13327)
- fix `csv` parser error when commented-out rows precede the header row (13318)
- invalid schema outer join after projection pd (13315)
- invalid predicate optimization (13313)
- Account for null values in categorical `unique/n_unique` (13308)
- fix schema when subtracting (13309)
- broadcasting of unit LHS in string operations (12737)
- casting list/arr to arr/list shouldn't convert chunks to logical type (13259)
- sorting categorical lexically bugs on null values (13271)
- improve replace on categoricals (13223)
- round trip to JSON and back should preserve Enum type (13267)
- enable and fix SIMD in polars-compute (13251)
- match\_chunks shouldn't change the dtype (13222)
- sink\_csv deadlock (13239)
- `is_in` operator for categoricals (13205)
- Better handle mismatched dtypes in `replace` (13213)
- Fix `replace` fast path by casting `old` input to the right data type (13176)
- ndjson nested null schema inference (13206)
- slice for `NullChunked` no longer force single chunk (13174)
- don't cast to unknown dtypes (13197)
- Allow casting nullable list to array (13196)
- maintain old join behavior in window expression (13179)
- Fix comparison of categoricals (13137)
- Use the name of the leftmost expression in horizontal operations (13143)
- any\_value should supports cast to boolean (13125)
- Update offsets of null value correctly for all `from_iter_xxx_trusted_len` (13132)
- fix neq for series cmp str (13128)
- fix category list builder append series with multiple chunks (13116)
- repeat\_by should not raise if by contains nulls (13105)
- [csv] raise on single quote char (13104)
- Raise if scan zstd compressed csv file (13102)
- Don't check map length if input is literal (13098)
- use `FunctionExpr`'s scalar return type for `is_in` (13091)
- rolling\_quantile can get incorrect state (13088)
- Fix off-by-one error in `quantile(method="nearest")` (13058)
- Fix incorrect schema inference on nested columns (13057)
- Don't raise for `datetime_range` if starting on ambiguous datetime and earliest was specified (13050)
- add cast safety to literals (12983)
- Parse `json_decode` per max buffer length (13029)
- Parse `00:00` time zone as UTC (13034)
- Fix timeout errors in concurrent downloads (13023)
- Fix SQL substring indexing (13016)
- Allow broadcasting in `ranges` (11900)
- Prevent deadlock in `sink_csv` (12991)
- Don't get mutable if buffer is sliced (12979)
- Dataframes with Decimal columns cannot be pickled (12955)
- Fix `truncate` when truncating by multiple weeks (12948)
- Fix segfault / memory corruption after plugins return `Err` result (12953)
- Don't panic when `ambiguous` parameter is not Utf8 (12913)
- don't panic on empty df in `merge_sort` (12923)
- Patch `rolling_var`/`rolling_std` numerical stability (12909)
- Fix incorrect Int16 `min`/`max` due to incorrect SIMD mask construction (12908)
- Fix OOB error in list set operations on empty frame (12845)
- Fix repr of `Expr.gather` (which was still showing deprecated take) (12864)
- Fix `nan_min/max` incorrectly aggregating chunks with addition (12848)
- write only one dict page per row rowgroup (12831)
- incorrect values from parquet RLE decoding (12818)
- Handle aggregation for all-NaN groups in `group_by` (12304)
- Use total float ordering in `is_in` (12800)
- Fix `NaN` ordering to make NaNs compare greater than any other float, and equal to themselves (12721)
- don't use streaming engine if aggregate is unknown (12769)
- hold align\_chunks\_invariant (12738)
- allow leading zero and plus in integer parsing (12744)
- csv lines iter, always return remainder (12739)
- fix oob in set operations (12736)
- undo regression in ability to read certain parquet files (12731)
- corr return nan if denominator is invalid (12708)
- parquet decimal statistics and schema (12705)
- support `append`/`extend` with null series (11824) (12686)
- fix carrying over infinity into other windows (12685)
- json null inference (12677)
- cov/corr respect f32 type (12676)
- fix ternary zip\_with null broadcast (12668)
- support negative slice on eager frame (12644)
- fix concurrency budget assertion (12641)
- fix oob in set operations (12640)
- Rename `not_` expression to `not` on the Rust side (12587)
- panic reading parquet nested struct column (12614)
- features: `performant,lazy,random` (12600)
- error when invalid list to array is given (12584)
- parquet: do not extend existing nested that is already complete (12569)
- accidental panic if predicate selects no files (12575)
- fix lazy parquet slice with nested columns (12558)
- ensure stats-evalutor exists (12566)
- list schema of list `eval` (12563)
- ensure concurrency budget never locks (12555)
- Fix lazy schema for `group_by_dynamic` and `rolling` (12551)
- address overflow on vec capacity calculation for `int_ranges` with negative step (12548)

๐Ÿ› ๏ธ Other improvements

- Update CODEOWNERS (13292)
- Change base url of docs/guide to `docs.pola.rs` (13281)
- Add note about Rust examples versioning in user guide (13280)
- split-up file\_sink module (13256)
- Rename `Utf8` data type to `String` (13224)
- update rustc (13219)
- fix horizontal concatenation documentation (13141)
- Set minimum version for `bytemuck` to `1.11` (13191)
- bump sysinfo from 0.29.11 to 0.30.0 (13188)
- Remove `polars-algo` reference in Cargo.toml (13187)
- Use the name of the leftmost expression in horizontal operations (13143)
- make pre\_agg generic (13150)
- move StaticArray to polars-arrow (13106)
- ensure we get cleaner logical plans with `any/all_horizontal` (13144)
- Update `auto_explode` param name to `returns_scalar` (13119)
- don't compile polars-ops by default (13100)
- update user-defined-functions for 0.19.x (13071)
- Linting updates (13069)
- take pl.concat out of StringCache context manager in "mismatched string cache" error message (13076)
- add Enum to dtype list (13080)
- further use TotalOrd (13046)
- Minor typo fix (13003)
- use new MinMax kernels (12961)
- Refer to arrow crate unambiguously from polars-parquet (12939)
- Fix issue with docs for `group_by_dynamic` (12906)
- Fix failing tests (12859)
- Update `make check` to only check `polars` crate (12834)
- apply TotalOrd in more places (12810)
- Use latest `atoi_simd` release (12748)
- simplify rolling\_median update (12745)
- move nan\_cmp and IsFloat to polars\_utils (12691)
- remove utf8 code in favor of binary (12604)
- update custom allocator instructions to include macOS (12593)
- Rename `str.json_extract` to `str.json_decode` (12586)
- parquet refactors (12574)
- convert all recursive parquet deserialize to iterative (12560)
- Rename DataFrame column index methods (12542)

Thank you to all our contributors for making this release possible!
0siride, MarcoGorelli, Object905, PierreAttard, Qqwy, RoDmitry, SeanTroyUWO, TNieuwdorp, Yerachmiel-Feltzman, adamreeve, alexander-beedie, c-peters, cardoso, cjfuller, dependabot, dependabot[bot], dmitrybugakov, eitsupi, fernandocast, gab23r, ion-elgreco, itamarst, jankislinger, jeroenboeye, kszlim, mcrumiller, nameexhaustion, oli-clive-griffin, orlp, paddymul, petrosbar, r-brink, rancomp, reswqa, ritchie46, rob-sil, robvanmieghem, romanovacca, stinodego, tkarabela, uchiiii and xuestrange


py-0.20.3
๐Ÿ† Highlights

- add `plot` namespace (which defers to hvplot) (13238)

๐Ÿš€ Performance improvements

- optimize set bit count (13317)
- speed up `.dt.truncate` for large numbers of years (13310)
- don't eagerly evaluate error branches (13311)
- don't trigger internal borrwing in numpy memmap (13304)
- don't needlessly allocate validity in concat/rechunk (13288)
- add fast path to `count_bits_set_by_offsets` (13253)
- make `.dt.truncate('*mo')` more than 3x faster (13192)

โœจ Enhancements

- support negative indices in `gather` in `group_by` context (13373)
- validate Enum categories (13356)
- improve Series/DataFrame init from existing Series/DataFrame objects (13344)
- support negative indexing in gather (select context) (13343)
- support min\_periods for temporal rolling aggregations (13342)
- support `REGEXP` and `RLIKE` pattern matching in SQL engine (13359)
- emit suggestion for how to replace map\_elements sigmoid function with expressions (13347)
- Support Enum types in interchange protocol (13368)
- add plot namespace (which defers to hvplot) (13238)
- gracefully handle panics in plugins (13329)
- rework `pl.exclude` as a pure selector, allowing other selectors as input (13301)
- Implement `unique/n_unique/unique_counts/is_unique/is_duplicated` for `Null` series (13307)
- support common variant spelling `STDEV` in the SQL engine (in addition to `STDDEV`) (13303)
- enhance expression-level `filter` syntax with support for multiple predicates and kwargs (12689)
- change doc links to new url docs.pola.rs (13290)
- support horizontal concatenation of LazyFrames (13139)
- Rename `Utf8` data type to `String`, keep `Utf8` as alias (13257)
- dispatch strict\_cast via cast (13255)
- Impl any/all for array type (13250)
- add cancellable queries (13178)
- add `offset` parameter to `gather_every` (13156)
- Support `Array` dtype AnyValue Series construction (12817)
- Allow `step` parameter in `int_ranges` to take an expression (13148)
- make python `map_batches` safer (13181)
- Implement `count` for DataFrame/LazyFrame (13153)

๐Ÿž Bug fixes

- don't lose track of `ones` and `zeros` dtype, improve use with `Array`, raise error if dtype invalid (13326)
- updated Display trait for enum categoricals (13331)
- nested dtypes: export logical type in plugins (13325)
- fix invalid dtype setting in array (13327)
- fix `csv` parser error when commented-out rows precede the header row (13318)
- invalid schema outer join after projection pd (13315)
- invalid predicate optimization (13313)
- Account for null values in categorical `unique/n_unique` (13308)
- fix schema when subtracting (13309)
- broadcasting of unit LHS in string operations (12737)
- sorting categorical lexically bugs on null values (13271)
- improve replace on categoricals (13223)
- round trip to JSON and back should preserve Enum type (13267)
- fix return type hint of list series any/all (13265)
- sink\_csv deadlock (13239)
- Correctly use `read_parquet` for all binary inputs (13218)
- `is_in` operator for categoricals (13205)
- Better handle mismatched dtypes in `replace` (13213)
- Fix `replace` fast path by casting `old` input to the right data type (13176)
- ndjson nested null schema inference (13206)
- don't cast to unknown dtypes (13197)
- maintain old join behavior in window expression (13179)

๐Ÿ› ๏ธ Other improvements

- reverse condtion order in udfs \_expr function (13348)
- Update release workflow for new upload/download artifact versions (13355)
- Allow construction of `Series` from memory buffers (13323)
- add 'pipe littering' to 'coming from pandas' section (13335)
- Refactor functionality related to Series buffers (13291)
- Restore light/darkmode switch in API reference (13312)
- Copy Makefile build commands to top level (13293)
- Fix release flags (13298)
- Re-enable consortium standard tests (13296)
- Update CODEOWNERS (13292)
- Add CPU compatibility check (13134)
- Change base url of docs/guide to `docs.pola.rs` (13281)
- Fix source link for dev docs (13279)
- fix return type hint of list series any/all (13265)
- Fix display of overloaded signatures (13258)
- clean up bytecode parsing a bit (13221)
- Add a couple of docstring examples to Series methods (13244)
- remove unnecessary arg unpacking (13241)
- update rustc (13219)
- fix horizontal concatenation documentation (13141)
- Replace blackdoc by ruff's new docstring formatter (13182)
- Update ruff \& ruff settings (13126)
- Link to latest object\_store docs in api doc (13180)
- Fix failing test (13171)

Thank you to all our contributors for making this release possible!
MarcoGorelli, TNieuwdorp, adamreeve, alexander-beedie, c-peters, cjfuller, dependabot, dependabot[bot], mcrumiller, nameexhaustion, orlp, petrosbar, r-brink, reswqa, ritchie46, robvanmieghem and stinodego


py-0.20.3-rc.2
๐Ÿš€ Performance improvements

- don't needlessly allocate validity in concat/rechunk (13288)
- add fast path to `count_bits_set_by_offsets` (13253)
- make `.dt.truncate('*mo')` more than 3x faster (13192)

โœจ Enhancements

- change doc links to new url docs.pola.rs (13290)
- support horizontal concatenation of LazyFrames (13139)
- Rename `Utf8` data type to `String`, keep `Utf8` as alias (13257)
- dispatch strict\_cast via cast (13255)
- Impl any/all for array type (13250)
- add cancellable queries (13178)
- add `offset` parameter to `gather_every` (13156)
- Support `Array` dtype AnyValue Series construction (12817)
- Allow `step` parameter in `int_ranges` to take an expression (13148)
- make python `map_batches` safer (13181)
- Implement `count` for DataFrame/LazyFrame (13153)

๐Ÿž Bug fixes

- sorting categorical lexically bugs on null values (13271)
- improve replace on categoricals (13223)
- round trip to JSON and back should preserve Enum type (13267)
- fix return type hint of list series any/all (13265)
- sink\_csv deadlock (13239)
- Correctly use `read_parquet` for all binary inputs (13218)
- `is_in` operator for categoricals (13205)
- Better handle mismatched dtypes in `replace` (13213)
- Fix `replace` fast path by casting `old` input to the right data type (13176)
- ndjson nested null schema inference (13206)
- don't cast to unknown dtypes (13197)
- maintain old join behavior in window expression (13179)

๐Ÿ› ๏ธ Other improvements

- Copy Makefile build commands to top level (13293)
- Fix release flags (13298)
- Re-enable consortium standard tests (13296)
- Update CODEOWNERS (13292)
- Add CPU compatibility check (13134)
- Change base url of docs/guide to `docs.pola.rs` (13281)
- Fix source link for dev docs (13279)
- fix return type hint of list series any/all (13265)
- Fix display of overloaded signatures (13258)
- clean up bytecode parsing a bit (13221)
- Add a couple of docstring examples to Series methods (13244)
- remove unnecessary arg unpacking (13241)
- update rustc (13219)
- fix horizontal concatenation documentation (13141)
- Replace blackdoc by ruff's new docstring formatter (13182)
- Update ruff \& ruff settings (13126)
- Link to latest object\_store docs in api doc (13180)
- Fix failing test (13171)

Thank you to all our contributors for making this release possible!
MarcoGorelli, TNieuwdorp, adamreeve, alexander-beedie, c-peters, cjfuller, dependabot, dependabot[bot], mcrumiller, orlp, petrosbar, r-brink, reswqa, ritchie46, robvanmieghem and stinodego


py-0.20.3-rc.1
๐Ÿš€ Performance improvements

- add fast path to `count_bits_set_by_offsets` (13253)
- make `.dt.truncate('*mo')` more than 3x faster (13192)

โœจ Enhancements

- Rename `Utf8` data type to `String`, keep `Utf8` as alias (13257)
- dispatch strict\_cast via cast (13255)
- Impl any/all for array type (13250)
- add cancellable queries (13178)
- add `offset` parameter to `gather_every` (13156)
- Support `Array` dtype AnyValue Series construction (12817)
- Allow `step` parameter in `int_ranges` to take an expression (13148)
- make python `map_batches` safer (13181)
- Implement `count` for DataFrame/LazyFrame (13153)

๐Ÿž Bug fixes

- sorting categorical lexically bugs on null values (13271)
- improve replace on categoricals (13223)
- round trip to JSON and back should preserve Enum type (13267)
- fix return type hint of list series any/all (13265)
- sink\_csv deadlock (13239)
- Correctly use `read_parquet` for all binary inputs (13218)
- `is_in` operator for categoricals (13205)
- Better handle mismatched dtypes in `replace` (13213)
- Fix `replace` fast path by casting `old` input to the right data type (13176)
- ndjson nested null schema inference (13206)
- don't cast to unknown dtypes (13197)
- maintain old join behavior in window expression (13179)

๐Ÿ› ๏ธ Other improvements

- Add CPU compatibility check (13134)
- Change base url of docs/guide to `docs.pola.rs` (13281)
- Fix source link for dev docs (13279)
- fix return type hint of list series any/all (13265)
- Fix display of overloaded signatures (13258)
- clean up bytecode parsing a bit (13221)
- Add a couple of docstring examples to Series methods (13244)
- remove unnecessary arg unpacking (13241)
- update rustc (13219)
- fix horizontal concatenation documentation (13141)
- Replace blackdoc by ruff's new docstring formatter (13182)
- Update ruff \& ruff settings (13126)
- Link to latest object\_store docs in api doc (13180)
- Fix failing test (13171)

Thank you to all our contributors for making this release possible!
MarcoGorelli, TNieuwdorp, adamreeve, alexander-beedie, c-peters, cjfuller, dependabot, dependabot[bot], mcrumiller, orlp, petrosbar, r-brink, reswqa, ritchie46, robvanmieghem and stinodego


py-0.20.2
๐Ÿš€ Performance improvements

- ensure single expression evaluation for replace (13147)
- drop the pyarrow conversion path in `iter_rows`; we can now do fully native conversion ~2-3x faster (13122)

โœจ Enhancements

- Move from GA to more privacy friendly framework (13155)
- prune all/any\_horizontals with single inputs (13146)
- ensure we get cleaner logical plans with `any/all_horizontal` (13144)

๐Ÿž Bug fixes

- Fix comparison of categoricals (13137)
- Use the name of the leftmost expression in horizontal operations (13143)
- any\_value should supports cast to boolean (13125)
- Update offsets of null value correctly for all `from_iter_xxx_trusted_len` (13132)
- fix neq for series cmp str (13128)
- Fix off-by-one error in `lit` dtype determination for integers (13129)
- fix category list builder append series with multiple chunks (13116)

๐Ÿ› ๏ธ Other improvements

- Fix release LTS CPU step (13160)
- Use the name of the leftmost expression in horizontal operations (13143)
- ensure we get cleaner logical plans with `any/all_horizontal` (13144)
- Minor cleanup of PyO3 bindings (13067)
- Update `auto_explode` param name to `returns_scalar` (13119)
- Mark whether the current package is the LTS-CPU version (13068)

Thank you to all our contributors for making this release possible!
alexander-beedie, c-peters, orlp, reswqa, ritchie46 and stinodego


py-0.20.1
๐Ÿž Bug fixes

- repeat\_by should not raise if by contains nulls (13105)
- [csv] raise on single quote char (13104)
- Raise if scan zstd compressed csv file (13102)
- allow timeunit-less dtype in `pl.lit` creation (12997)
- Don't check map length if input is literal (13098)
- rolling\_quantile can get incorrect state (13088)

๐Ÿ› ๏ธ Other improvements

- Fix column name in `contains_any` example (13090)
- update user-defined-functions for 0.19.x (13071)
- Fix some links, and make `map_batches` warning more evident (13081)
- Linting updates (13069)
- take pl.concat out of StringCache context manager in "mismatched string cache" error message (13076)
- add Enum to dtype list (13080)

Thank you to all our contributors for making this release possible!
MarcoGorelli, mcrumiller, reswqa, ritchie46 and stinodego


py-0.20.0
This version includes quite a few breaking changes. We are preparing for the `1.0` release and aim to make the upgrade from `0.20` to `1.0` as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with `1.0`.

Check out the [upgrade guide](https://pola-rs.github.io/polars/releases/upgrade/0.20/) for help navigating the upgrade to this version.

Please bear with us while we continue to make Polars the best tool it can be!

๐Ÿ† Highlights

- Add new `Enum` categorical data type which allows a fixed set of categories (11822)

๐Ÿ’ฅ Breaking changes

- Use Object Store instead of fsspec for `read_parquet` (13044)
- Reimplement `replace` expression on the Rust side (13002)
- Preserve left and right join keys in outer joins (12963)
- Update `update` signature (12986)
- Update `Expr.count` to ignore null values by default (12934)
- Scheduled removal of previously deprecated functionality (12885)
- Allow all `DataType` objects to be instantiated (12470)
- Change `value_counts` resulting column name from `counts` to `count` (12506)
- Change default `join` behavior with regard to nulls, add `join_nulls` parameter to keep existing behavior (12840)
- Default to exact checking for integers in assertion utils (12331)
- Set default dtype for Series to `Null` when no data is present (12807)
- Update `lit` behavior for list/tuple inputs (12559)
- Change `DataType.is_nested` from property to classmethod (12453)
- Update constructors for Array and Decimal (12837)
- Smaller integer data types for datetime components (12070)
- Fix `NaN` ordering to make NaNs compare greater than any other float, and equal to themselves (12721)

โš ๏ธ Deprecations

- Rename `write_database` parameter `if_exists` to `if_table_exists` (12783)

๐Ÿš€ Performance improvements

- Avoid dispatching to expression engine for various `Series` methods (13010)
- Elide allocation in outer join materialization (12992)
- Avoid dispatching `Series.head/tail` to the expression engine (12946)
- Ensure we reduce for `any/all_horizontal` (12976)
- Add fast paths for UTC in `truncate` (12965)
- Use `select_seq` for expression dispatch (12962)
- Improve `rolling_median` algorithm (12704)
- Use fast path for non-null data in new SQL-like null matching (12874)
- Optimize `DataFrame.iter_rows` for smaller buffer sizes (12804)
- Speed up initializing `Series` from a list of NumPy arrays (12785)

โœจ Enhancements

- Add `str.contains_any` and `str.replace_many` (Aho-Corasick algorithms) (13073)
- Auto-infer credentials from `.aws` folder (13062)
- Support private cloud S3 storage in `scan_parquet` (13060)
- Use Object Store instead of fsspec for `read_parquet` (13044)
- Avoid dispatching to expression engine for various `Series` methods (13010)
- Allow order operators (\<,>,>=,\<=) on Enum types (12982)
- Reimplement `replace` expression on the Rust side (13002)
- Expand set of NumPy functions which emit `inefficient map_*` warning (13039)
- Use tokio semaphore for concurrency handling (13026)
- Improve and expressify `hist` (13014)
- Update `describe` to use new `count` implementation (12990)
- Add default `to_struct` Series name consistent with the usual default Series name (empty string) (12998)
- Preserve left and right join keys in outer joins (12963)
- Clarify "inefficient `map_elements`" warning message (12978)
- Allow `end` before `start` in `date/time_range` (12964)
- Update `update` signature (12986)
- Minor update to `Array` data type repr (12973)
- Implement group-tuples for `Null` dtype (12975)
- Cast to an enum from int (12954)
- Move categorical ordering into dtype (12911)
- Avoid importing interchange module by default (12927)
- Update `Expr.count` to ignore null values by default (12934)
- Raise if expression passed as scalar to DataFrame constructor (12916)
- Update `repr` of `Struct` data type class (12922)
- Enable partial predicate pushdown past window expressions (12710)
- Add `merge` mode to `write_delta` and remove pyarrow to delta conversions (12392)
- Add `str.reverse` (12878)
- Allow all `DataType` objects to be instantiated (12470)
- Specific performance warnings from Rust to Python (12802)
- Change `value_counts` resulting column name from `counts` to `count` (12506)
- Implement `std` and `var` for `Duration` columns (12865)
- Change default `join` behavior with regard to nulls, add `join_nulls` parameter to keep existing behavior (12840)
- Enhance `write_database` return (indicate the number of rows affected by the operation) (12830)
- Add dedicated `Decimal` selector (12852)
- Preserve base dtype when raising to `UInt` power (10446)
- Default to exact checking for integers in assertion utils (12331)
- Improve `__repr__` implementation for `Expr` (12770)
- Support SQL subqueries for `JOIN` and `FROM` (12819)

๐Ÿž Bug fixes

- Fix off-by-one error in `quantile(method="nearest")` (13058)
- Fix incorrect schema inference on nested columns (13057)
- Don't raise for `datetime_range` if starting on ambiguous datetime and earliest was specified (13050)
- Parse `json_decode` per max buffer length (13029)
- Parse `00:00` time zone as UTC (13034)
- Fix timeout errors in concurrent downloads (13023)
- Streamline `align_frames` and fix edge-case where the identical frame object appears more than once (13007)
- Fix SQL substring indexing (13016)
- Allow broadcasting in `ranges` (11900)
- Prevent deadlock in `sink_csv` (12991)
- Don't get mutable if buffer is sliced (12979)
- Support parameterized `read_database` calls against cursors that only take positional args (12967)
- Fix `truncate` when truncating by multiple weeks (12948)
- Fix segfault / memory corruption after plugins return `Err` result (12953)
- Raise a proper python typed exception when IO writers try to write to an non existent folder (12936)
- Don't panic when `ambiguous` parameter is not Utf8 (12913)
- Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (12919)
- Patch `rolling_var`/`rolling_std` numerical stability (12909)
- Fix incorrect Int16 `min`/`max` due to incorrect SIMD mask construction (12908)
- Improve handling of decimal conversion with `to_numpy` in the absence of pyarrow (12888)
- Fix OOB error in list set operations on empty frame (12845)
- Fix error message for uninstantiated `Enum` types (12886)
- Fix repr of `Expr.gather` (which was still showing deprecated take) (12864)
- Fix `Array` dtype equality (12853)
- Fix `nan_min/max` incorrectly aggregating chunks with addition (12848)
- Revert type hint change on expression inputs (12792)
- More accurate type hinting for `collect_all` functions (12796)
- Use total float ordering in is_in (12800)
- Handle aggregation for all-NaN groups in `group_by` (12304)

๐Ÿ› ๏ธ Other improvements

- Update version switcher for `0.20` (12844)
- Add upgrade guide for Python Polars 0.20 (12872)
- Run doctests before other tests (13047)
- Update `describe` calculation of min/max (13027)
- Minor typo fix (13003)
- Resolve two interchange tests failing locally (12999)
- Update outdated links to API in Expressions/Functions page (12981)
- Expand docstrings for `count` (12960)
- Fix issue with docs for `group_by_dynamic` (12906)
- Prefer explicit `--no-cov` flag for py3.12/ubuntu test workflow (vs implicit/omitted) (12889)
- Scheduled removal of previously deprecated functionality (12885)
- Fix references in deprecation notes (12877)
- Fix typo in `hash` docstring (12879)
- Fix docstring for deprecated `list.take` (12873)
- Note that `list.take` is deprecated (12867)
- Fix failing tests (12859)
- Add quotes to `pip install` with dependencies (12799)
- Fix parameter name reference in `update` docstring 12797

Thank you to all our contributors for making this release possible!
MarcoGorelli, Object905, Yerachmiel-Feltzman, alexander-beedie, c-peters, ion-elgreco, jankislinger, mcrumiller, nameexhaustion, oli-clive-griffin, orlp, rancomp, ritchie46, romanovacca, stinodego and xuestrange


py-0.19.19
โœจ Enhancements

- Parquet support required deltabyte encoding (12836)

๐Ÿž Bug fixes

- Fix incorrect values from parquet RLE decoding (12818)
- Write only one dict page per row rowgroup (12831)

Thank you to all our contributors for making this release possible!
nameexhaustion, ritchie46 and stinodego


py-0.19.18
โœจ Enhancements

- support nested null in vstack/append/extend/concat (12771)
- Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (12421)
- determine mode parallelism depending on current tasks (12764)
- enable slice push down past `with_columns` (12742)
- Improve `write_database`, accounting for latest `adbc` fixes/updates (12713)

๐Ÿž Bug fixes

- don't use streaming engine if aggregate is unknown (12769)
- Enable special casing of sequence in list\_to\_struct (12759)
- hold align\_chunks\_invariant (12738)
- allow leading zero and plus in integer parsing (12744)
- csv lines iter, always return remainder (12739)
- fix oob in set operations (12736)
- undo regression in ability to read certain parquet files (12731)

๐Ÿ› ๏ธ Other improvements

- Use latest `atoi_simd` release (12748)
- Fix invalid references to `xlsx2csv` dependency (12741)
- Remove pinned `aiohttp` dependency (12733)

Thank you to all our contributors for making this release possible!
0siride, PierreAttard, RoDmitry, alexander-beedie, dependabot, dependabot[bot], eitsupi, kszlim, nameexhaustion, orlp, ritchie46 and stinodego


py-0.19.17
โœจ Enhancements

- Automatically wrap NumPy array as lit (12709)
- Add `DataFrame.iter_columns` (12653)
- favour showing "adbc\_driver\_manager" over "adbc\_driver\_sqlite" in `show_versions` (12690)

๐Ÿž Bug fixes

- corr return nan if denominator is invalid (12708)
- parquet decimal statistics and schema (12705)
- support `append`/`extend` with null series (11824) (12686)
- address a numpy ndarray init regression (12701)
- fix carrying over infinity into other windows (12685)

๐Ÿ› ๏ธ Other improvements

- Update URI prefix in examples (prefer "postgresql" to "postgres") (12707)
- now that `scan_parquet` supports hive partitioning, remove note pointing to `scan_pyarrow_dataset` (12706)
- Minor docstring fixes (12688)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, c-peters, ritchie46, stinodego and tkarabela


py-0.19.16
โš ๏ธ Deprecations

- Rename `series_equal`/`frame_equal` to `equals` (12618)
- Rename `map_dict` to `replace` and change default behavior (12599)

๐Ÿš€ Performance improvements

- order(s) of magnitude speedup when initialising `List` dtype `Series` from 2D numpy array (12672)
- improve `merge_local_rhs_categorical` traversal (12660)
- make values\_size estimate correct for sliced arrays (12658)
- improve parquet utf8 validation (12655)
- parquet pre-allocate buffer in binary plain encode (12652)
- optimize dict binary decoding in parquet (12648)
- ensure we only check the values within bounds (12633)
- parquet; elide recursion in hot path (12625)
- improve cov/corr algorithm (12590)

โœจ Enhancements

- Join operations on local categoricals (12657)
- Implement `PySeries.from_buffer` for boolean buffers (12654)
- Implement `PySeries.from_buffer` for numeric types (12646)
- use RLE\_DICTIONARY for integers in parquet (12647)
- extend recent `filter` syntax upgrades to `when/then` construct (12603)
- implement RLE\_DICT encoding for utf8/binary columns (reduced parquet file size) (12623)
- implement 'DeltaByteArray' decoding for parquet (12602)

๐Ÿž Bug fixes

- json null inference (12677)
- cov/corr respect f32 type (12676)
- fix ternary zip\_with null broadcast (12668)
- support negative slice on eager frame (12644)
- fix concurrency budget assertion (12641)
- fix oob in set operations (12640)
- panic reading parquet nested struct column (12614)
- Fix deprecation message for `DataFrame.sum` (12619)
- features: `performant,lazy,random` (12600)

๐Ÿ› ๏ธ Other improvements

- Use `range` instead of `np.arange` in constructors (12621)
- update custom allocator instructions to include macOS (12593)

Thank you to all our contributors for making this release possible!
alexander-beedie, c-peters, cardoso, dmitrybugakov, nameexhaustion, orlp, ritchie46 and stinodego


py-0.19.15
โš ๏ธ Deprecations

- Rename `str.json_extract` to `str.json_decode` (12586)

๐Ÿš€ Performance improvements

- apply left side predicate pushdown also to right side on semi join (12565)
- ensure streaming parquet download remains concurrent `~7x` (12552)

โœจ Enhancements

- warn if `by` column is not sorted in rolling aggregations (as opposed to raising), add warn\_if\_unsorted argument (12398)
- struct -> json encoding expression (12583)
- Implement support for multi-character comments in `read_csv` (12519)
- Implement `LazyFrame.sink_ndjson` (10786)
- use JEMALLOC on all unix architectures (12568)
- improve concurrency parameters (12567)
- In explain(), rename PIPELINE to STREAMING so it's clearer what it means (12547)

๐Ÿž Bug fixes

- error when invalid list to array is given (12584)
- parquet: do not extend existing nested that is already complete (12569)
- accidental panic if predicate selects no files (12575)
- fix lazy parquet slice with nested columns (12558)
- ensure stats-evalutor exists (12566)
- list schema of list `eval` (12563)
- ensure concurrency budget never locks (12555)
- Fix lazy schema for `group_by_dynamic` and `rolling` (12551)
- address overflow on vec capacity calculation for `int_ranges` with negative step (12548)

๐Ÿ› ๏ธ Other improvements

- convert all recursive parquet deserialize to iterative (12560)
- Minor cleanup in Expr class (12549)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Qqwy, alexander-beedie, dmitrybugakov, fernandocast, gab23r, itamarst, nameexhaustion, ritchie46, stinodego and uchiiii


rs-0.35.0
๐Ÿ† Highlights

- improve join performance through radix partitioned join (12270)

๐Ÿ’ฅ Breaking changes

- Rename cumulative functions `cumsum -> cum_sum` and similar (12513)
- Rename `take` to `gather` (12528)
- Add dedicated horizontal aggregation methods to `DataFrame` (12492)
- Rename `take_every` to `gather_every` (12531)
- Deprecate `parse_int` in favor of `to_integer` (12464)
- plugins add version and context (12433)
- Fix `scan_csv` error type (12355)
- Rename `write_csv` parameter `has_header` to `include_header` (12351)
- Rename `is_signed` to `is_signed_integer` (12220)
- Rename `dt.seconds` to `dt.total_seconds` (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (12179)
- Rename `ljust`/`rjust` to `pad_end`/`pad_start` (11975)

๐Ÿš€ Performance improvements

- speed up cov/corr with SIMD + strength-reduction `~3x 0.19.13/ ~2x numpy` (12471)
- apply predicates and statistics of parquet files in streaming mode (12439)
- use online algorithm for cov/corr `~2x` (12412)
- indexvec in group-by (12371)
- reduce allocations in hash join (12368)
- change concurrency parameters (12321)
- improve join performance through radix partitioned join (12270)
- remove extra multiplication in hash\_to\_partition (12233)
- allow non-power-of-two partitions (12225)
- Reduce compute in error message for failed datetime parsing (12147)
- improve parquet downloading (12061)

โœจ Enhancements

- Add dedicated horizontal aggregation methods to `DataFrame` (12492)
- support http scan\_parquet (12517)
- Add support for UTF-8 BOM option in `write_csv` and `sink_csv` (12253)
- remove lexical (replace with atoi\_simd, ryu, and itao). (12512)
- Allow comparison of two local categories with the same hash (12503)
- more changes for versioned plugins (12504)
- plugins add version and context (12433)
- include i128 in more primitive functions (12413)
- write rolling functions as private expressions. (12379)
- Add `round_sig_figs` expression for rounding to significant figures (11959)
- change concurrency parameters (12321)
- deprecate `_saturating` in duration string language, make it the default (12301)
- auto infer `ambiguous` for truncate and round (12204)
- Rename `is_signed` to `is_signed_integer` (12220)
- New `Config` options for numeric formatting: digit grouping and thousands/decimal separator (12099)
- allow non-aggregation predicate in ternary groupby (12286)
- Add `name=` in `.write_avro` to set schema name (12255)
- Add support for reading zstd compressed files (no-options) in read\_csv (12214)
- start prefetching all files immediately (12201)
- Add `.list.to_array` expression (12192)
- consolidate \& improve all casting failure error messages (12168)
- tunable concurrency (12171)
- support reverse sort in streaming (12169)
- Add `.arr.to_list` expression (12136)
- add concurrency budget (12117)
- Introduce ignore\_nulls for str.concat (12108)
- casting utf8 to temporal (12072)
- Add supertype for `List`/`Array` (12016)
- enable eq and neq for array dtype (12020)
- Expressify n of shift (12004)
- add dedicated `name` namespace for operations that affect expression names (11973)

๐Ÿž Bug fixes

- fix incorrect ternary agg states (12538)
- fix and improve ternary evaluation on groups (12529)
- saturating sub in debug msg (12525)
- fix panic when writing `Decimal` type to parquet (12532)
- pre-fefetch struct columns in async projection pd (12514)
- rechunk cross join output in streaming (12511)
- fix as\_list logical types (12507)
- fix streaming cross join on empty df (12491)
- dont overflow when calculating date range over very long periods (12479)
- Allow append/zip\_with/extend on local categoricals (12369)
- Do not panic if time is invalid (12466)
- empty csv no-raise (12434)
- Fix `scan_csv` error type (12355)
- binary operations in aggregation context on literals (12430)
- update groups state after binary aggregation (12415)
- Remove extra `\n` when reading file-like object wiโ€ฆ (12333)
- revert ternary special broadcast, ensure broadcast is always to max height (12395)
- ensure first/last return null if empty (12401)
- Do not cast lit if has same dtype (12342)
- Fix index column name of rolling/dynamic group by (12365)
- ternary broadcasting with empty truthy or falsy and agg predicate (12357)
- uint64 should be correctly extracted from python object (12338)
- expr\_output\_name include literal (12335)
- Fix Decimal dtype table repr (12318)
- Fix behavior of month intervals in `date_range` (12317)
- scan emtpy csv miss row\_count (12316)
- zip\_with also broadcast mask (12309)
- respect hive\_partitioning flag when dealing with multiple files (12315)
- parquet, add row\_count to empty file materialization (12310)
- fix download ranges in parquet (12313)
- object store path derivation for local URL (12308)
- don't move right endpoint of windows in rolling in default `offset==-period` case (12267)
- Raise more informative error on invalid `reshape` input (12288)
- incorrect super type for literals in nested binary exprs (12238)
- Update `null_count` after arithmetic (12280)
- fix ambiguous aggregation type (12269)
- Consistently propagate nulls for `numpy` ufuncs (12212)
- respect return\_scalar of list scalars (12251)
- potential overflow (12206)
- always start a new thread if the thread is already blocking (12202)
- with\_row\_count should block predicate push down for lazy csv (12187)
- rechunk failed-list series before iterate (12189)
- Raise if \*\_horizontal without inputs (12106)
- fix incorrect desc sort behavior (12141)
- `take` should block predicate pushdown (12130)
- use null type when read from unknown row (12128)
- boundary predicate to block all accumulated predicates in push down (12105)
- make python `schema_overrides` information available to the rust-side inference code when initialising from records/dicts (12045)
- fix panic when initializing Series with array of list dtype (12148)
- Fix schema of arr.min/max (12127)
- ensure filter predicate inputs exist in schema (12089)
- str.concat on empty list (12066)
- binary agg should group aware if literal not a scalar (12043)
- Use Arrow schema for file readers (12048)
- Error on duplicates in hive partitioning (12040)
- display fmt for str split (12039)
- sum\_horizontal should not always cast to int (12031)
- fix apply\_to\_inner's dtype (12010)
- Fix padding for non-ASCII strings (12008)
- inline parts of unstable unicode module for stable (12003)
- fix dot visualization of anonymous scans (12002)
- SQL table aliases (11988)

๐Ÿ› ๏ธ Other improvements

- Rename cumulative functions `cumsum -> cum_sum` and similar (12513)
- fix and improve ternary evaluation on groups (12529)
- Rename `take` to `gather` (12528)
- Add dedicated horizontal aggregation methods to `DataFrame` (12492)
- Rename `take_every` to `gather_every` (12531)
- Add `polars-ds` to list of community plugins (12527)
- add schema test (12523)
- remove lexical (replace with atoi\_simd, ryu, and itao). (12512)
- add test for previous commit (12510)
- Support Python 3.12 (12094)
- Fix some typos (12485)
- Deprecate `parse_int` in favor of `to_integer` (12464)
- update rustc (12468)
- rename the `DataType` in the polars-arrow crate to `ArrowDataType` for clarity, preventing conflation with our own/native `DataType` (12459)
- Replace outdated dev dependency `tempdir` (12462)
- move cov/corr to polars-ops (12411)
- use unwrap\_or\_else and get\_unchecked\_release in rolling kernels (12405)
- dprint/markdown link checker minor updates (12409)
- replace as\_u64 with dirty\_hash (12327)
- Fix ruff linting invocation (12350)
- Rename `write_csv` parameter `has_header` to `include_header` (12351)
- Build and verify Rust examples in docs (12334)
- Fix some feature flags (12325)
- Organize Cargo.toml (12323)
- remove fxhash (12322)
- Run rustfmt on doc examples (12319)
- Consolidate "getting started" and "user guide" sections (12246)
- deprecate `_saturating` in duration string language, make it the default (12301)
- simplify expr checking in predicate push down (12287)
- Replace dev dependency `avro-rs` with `apache-avro` (12295)
- Run `clippy` on all targets (12293)
- Add top-level `make clippy`, simplify Rust linting workflows (12290)
- ensure we git-ignore ALL `.venv` dirs (12289)
- incorrect super type for literals in nested binary exprs (12238)
- remove unwrap from group\_by (12263)
- update object\_store (12006) (12273)
- Remove recommended setting from IDE docs (12275)
- Add feature flag for `list.eval` (12254)
- factor out some shared code in `truncate_impl` (12229)
- update Cargo.lock (12226)
- Make all functions in string namespace non-anonymous (12215)
- Rename `dt.seconds` to `dt.total_seconds` (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (12179)
- use enum for Ambiguous (12193)
- Standardize project name formatting across docs (12185)
- Update `sqlparser` to `0.39` (12173)
- pin ring (12176)
- Refactor `FunctionExpr` module (12162)
- Fix tests for pyarrow 14 (12170)
- Fix triggers for docs deployment (12159)
- Make all functions in binary namespace non-anonymous (12126)
- Consolidate contributing info (12109)
- Fix typo in user-guide/expressions/plugins.md (12115)
- Update CODEOWNERS (12107)
- visualize plugin directory layout in user guide (12092)
- Minor improvements to the docs website (12084)
- reshape and repeat\_by non-anoymous (12064)
- upgrade zstd to 0.13 in `polars-parquet` (12062)
- Direct CONTRIBUTING to the docs website (12042)
- inline parquet2 (12026)
- remove parquet logic from `polars-arrow` and consolidate logic in `polars-parquet` crate. (12022)
- move abs to ops (12005)
- Rename `ljust`/`rjust` to `pad_end`/`pad_start` (11975)
- Disable type checking for `dataframe_api_compat` dependency (11997)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Priyansh121096, abstractqqq, alexander-beedie, braaannigan, brayanjuls, c-peters, cmdlineluser, daviskirk, dependabot, dependabot[bot], dgilman, hirohira9119, ion-elgreco, jerome3o, jrycw, mcrumiller, messense, moritzwilksch, nameexhaustion, orlp, owrior, rancomp, reswqa, ritchie46, rob-sil, stefmolin, stinodego, uchiiii, universalmind303 and wsyxbcl


py-0.19.14
๐Ÿ† Highlights

- Support Python 3.12 (12094)
- make 1D numpy to polars conversion zero-copy for numeric data (12403)

โš ๏ธ Deprecations

- Rename DataFrame column index methods (12542)
- Rename `Series.set_at_idx` to `scatter` (12540)
- Deprecate `Series.view` (12539)
- Rename cumulative functions `cumsum -> cum_sum` and similar (12513)
- Rename `take` to `gather` (12528)
- Add dedicated horizontal aggregation methods to `DataFrame` (12492)
- Rename `take_every` to `gather_every` (12531)
- Deprecate `Series.inner_dtype` property (12494)
- Deprecate `parse_int` in favor of `to_integer` (12464)
- Deprecate DataType method `is_not` (12458)
- Deprecate Series methods `is_boolean` and `is_utf8` (12457)
- Add `DataType.is_integer` and other dtype groups (12200)

๐Ÿš€ Performance improvements

- speed up parquet download of streaming engine (12544)
- speed up cov/corr with SIMD + strength-reduction `~3x 0.19.13/ ~2x numpy` (12471)
- apply predicates and statistics of parquet files in streaming mode (12439)
- use online algorithm for cov/corr `~2x` (12412)
- make 1D numpy to polars conversion zero-copy for numeric data (12403)

โœจ Enhancements

- Add dedicated horizontal aggregation methods to `DataFrame` (12492)
- support http scan\_parquet (12517)
- Add support for UTF-8 BOM option in `write_csv` and `sink_csv` (12253)
- remove lexical (replace with atoi\_simd, ryu, and itao). (12512)
- more changes for versioned plugins (12504)
- plugins add version and context (12433)
- Add `DataType.is_integer` and other dtype groups (12200)
- include i128 in more primitive functions (12413)
- write rolling functions as private expressions. (12379)

๐Ÿž Bug fixes

- fix incorrect ternary agg states (12538)
- fix and improve ternary evaluation on groups (12529)
- saturating sub in debug msg (12525)
- fix panic when writing `Decimal` type to parquet (12532)
- pre-fefetch struct columns in async projection pd (12514)
- rechunk cross join output in streaming (12511)
- Ensure behaviour of`Series` comparison with `timedelta` matches that of other types (12497)
- fix as\_list logical types (12507)
- fix streaming cross join on empty df (12491)
- dont overflow when calculating date range over very long periods (12479)
- Allow append/zip\_with/extend on local categoricals (12369)
- Do not panic if time is invalid (12466)
- ensure explicit "return\_dtype" is respected by `map_dicts` (12436)
- empty csv no-raise (12434)
- Fix `scan_csv` error type (12355)
- binary operations in aggregation context on literals (12430)
- raw HTML output alignment was incorrect for dtype in header (12422)
- update groups state after binary aggregation (12415)
- Remove extra `\n` when reading file-like object wiโ€ฆ (12333)
- Issue correct `PolarsInefficientMapWarning` for lshift/rshift operations (12385)
- revert ternary special broadcast, ensure broadcast is always to max height (12395)
- ensure first/last return null if empty (12401)

๐Ÿ› ๏ธ Other improvements

- fix and improve ternary evaluation on groups (12529)
- Add `polars-ds` to list of community plugins (12527)
- Future-proof consortium standard test (12524)
- add schema test (12523)
- remove lexical (replace with atoi\_simd, ryu, and itao). (12512)
- add test for previous commit (12510)
- Update `polars-hash` reference (12505)
- Add note on hash stability and mention `polars-hash` (12496)
- Support Python 3.12 (12094)
- Improved `import polars` timing test; now much more consistent/reliable (12478)
- Use `.with_columns()` in all `.list` namespace examples (12475)
- update rustc (12468)
- Fix docs trigger (12449)
- Update for new maturin release (12437)
- Remove 'experimental' tag for auto-structify setting (12435)
- make "DataFrame" and "Series" case more consistent across docs/comments/errors (12428)
- dprint/markdown link checker minor updates (12409)
- Use `manylinux_2_17` for building `x86-64` wheel (12408)
- Use manylinux 2.24 instead of 2.28 for compatibility reasons (12397)
- use with\_columns in is\_in example, and fix some bullet points not rendering (12383)

Thank you to all our contributors for making this release possible!
MarcoGorelli, abstractqqq, alexander-beedie, c-peters, cmdlineluser, hirohira9119, ion-elgreco, jerome3o, nameexhaustion, reswqa, ritchie46, stinodego and uchiiii


py-0.19.13
๐Ÿ† Highlights

- improve join performance through radix partitioned join (12270)

โš ๏ธ Deprecations

- Rename `write_csv` parameter `has_header` to `include_header` (12351)
- Deprecate `_saturating` in duration string language, make it the default (12301)
- Switch args for `Decimal` and set default `scale=0` (12224)
- Rename `dt.seconds` to `dt.total_seconds` (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (12179)
- Deprecate `DataFrame.as_dict` positional input (12131)

๐Ÿš€ Performance improvements

- indexvec in group-by (12371)
- Reduce allocations in hash join (12368)
- Change concurrency parameters (12321)
- Improve join performance through radix partitioned join (12270)
- Remove extra multiplication in hash\_to\_partition (12233)
- Allow non-power-of-two partitions (12225)
- Reduce compute in error message for failed datetime parsing (12147)

โœจ Enhancements

- Updated `BytecodeParser` for Python 3.12 (12348)
- Add `round_sig_figs` expression for rounding to significant figures (11959)
- Change concurrency parameters (12321)
- Deprecate `_saturating` in duration string language, make it the default (12301)
- Auto-infer `ambiguous` for truncate and round (12204)
- Allow construction of `Datetime` series from `datetime.date` array (12175)
- New `Config` options for numeric formatting: digit grouping and thousands/decimal separator (12099)
- Allow non-aggregation predicate in ternary groupby (12286)
- Add `name=` in `.write_avro` to set schema name (12255)
- Update `write_delta` to write large arrow types without casting (12260)
- Add support for reading zstd compressed files (no-options) in read\_csv (12214)
- Start prefetching all files immediately (12201)
- Expose more options to plugin registration (12197)
- Add `.list.to_array` expression (12192)
- Consolidate \& improve all casting failure error messages (12168)
- Add Binary dtype to hypothesis tests (12140)
- Tunable concurrency (12171)
- Support reverse sort in streaming (12169)
- Add `.arr.to_list` expression (12136)
- Support decimals in assert utils (12119)
- Add concurrency budget (12117)
- Improved support for use of file-like objects with `DataFrame` "write" methods (12113)
- Introduce ignore\_nulls for str.concat (12108)

๐Ÿž Bug fixes

- Do not cast lit if has same dtype (12342)
- Fix index column name of rolling/dynamic group by (12365)
- Ternary broadcasting with empty truthy or falsy and agg predicate (12357)
- `UInt64` should be correctly extracted from python object (12338)
- Ignore IDE-mediated DeprecationWarning when debugging tests under 3.12 (12343)
- expr\_output\_name include literal (12335)
- Fix Decimal dtype table repr (12318)
- Fix behavior of month intervals in `date_range` (12317)
- Scan empty csv miss row\_count (12316)
- zip\_with also broadcast mask (12309)
- respect hive\_partitioning flag when dealing with multiple files (12315)
- parquet, add row\_count to empty file materialization (12310)
- Fix invalid DeprecationWarning generated from `date_range` defined with 'saturating' interval (12311)
- fix download ranges in parquet (12313)
- object store path derivation for local URL (12308)
- don't move right endpoint of windows in rolling in default `offset==-period` case (12267)
- Raise more informative error on invalid `reshape` input (12288)
- incorrect super type for literals in nested binary exprs (12238)
- typo in exception message (12278)
- fix ambiguous aggregation type (12269)
- return frames from `read_excel` in the originally specified order (12243)
- Consistently propagate nulls for `numpy` ufuncs (12212)
- respect return\_scalar of list scalars (12251)
- fix plugins system on Windows (12230)
- potential overflow (12206)
- always start a new thread if the thread is already blocking (12202)
- with\_row\_count should block predicate push down for lazy csv (12187)
- rechunk failed-list series before iterate (12189)
- Fix interchange protocol boolean buffer size (12177)
- fix incorrect desc sort behavior (12141)
- `take` should block predicate pushdown (12130)
- use null type when read from unknown row (12128)
- boundary predicate to block all accumulated predicates in push down (12105)
- make python `schema_overrides` information available to the rust-side inference code when initialising from records/dicts (12045)
- fix panic when initializing Series with array of list dtype (12148)
- Fix schema of arr.min/max (12127)
- ensure filter predicate inputs exist in schema (12089)
- Update `null_count` after arithmetic (12280)

๐Ÿ› ๏ธ Other improvements

- Workaround for maturin issue (12370)
- Fix incorrect boundary column name in `group_by_dynamic` docstrings (12366)
- Fix typo in `rolling_*` docstrings (12362)
- Fix ruff linting invocation (12350)
- Clean up conversion utils (11789)
- Organize Cargo.toml (12323)
- Consolidate "getting started" and "user guide" sections (12246)
- Minor updates to prepare for Python 3.12 support (12314)
- Move script for testing map warning (12306)
- simplify expr checking in predicate push down (12287)
- Remove external link (12223)
- Fix rebase issue breaking CI (12296)
- Add top-level `make clippy`, simplify Rust linting workflows (12290)
- ensure we git-ignore ALL `.venv` dirs (12289)
- incorrect super type for literals in nested binary exprs (12238)
- Remove recommended setting from IDE docs (12275)
- Clean up Python test workflow (12261)
- clarify contains selector (12265)
- Add `py-polars` to Cargo workspace (12256)
- Use `.with_columns` in some docstrings (12250)
- Add test for `scan_csv` plus `slice` (12239)
- Fix emphasis formatting in docstring (12240)
- Fix emphasis formatting in docstring (12237)
- add deprecation notices to the docs for expressions moved into the new `name` namespace (12236)
- update Cargo.lock (12226)
- make sort test work with unstable sort (12221)
- Build Python wheels on `manylinux_2_28` (12211)
- Include `rust-toolchain.toml` with sdist/wheels (12184)
- Standardize project name formatting across docs (12185)
- Update `sqlparser` to `0.39` (12173)
- pin ring (12176)
- Improve `strip_{prefix, suffix}` \& `strip_chars_{start, end}` (12161)
- Fix tests for pyarrow 14 (12170)
- Fix rendering of note in `DataFrame.fold` (12164)
- Fix triggers for docs deployment (12159)
- Refactor some tests (12121)
- Consolidate contributing info (12109)
- Fix typo in user-guide/expressions/plugins.md (12115)
- Render docstring text in single backticks as code (12096)
- use more ergonomic syntax in select/with\_columns where possible (12101)
- Update CODEOWNERS (12107)
- visualize plugin directory layout in user guide (12092)
- Minor tweak in code example in section Expressions/Aggregation (12033)
- Minor tweak in code example in section Expressions/Missing data (12080)
- Minor improvements to the docs website (12084)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Priyansh121096, alexander-beedie, cmdlineluser, daviskirk, dependabot, dependabot[bot], dgilman, hirohira9119, ion-elgreco, jrycw, mcrumiller, moritzwilksch, nameexhaustion, orlp, owrior, rancomp, reswqa, ritchie46, rob-sil, stefmolin, stinodego and wsyxbcl


py-0.19.13-rc.1
โš ๏ธ Deprecations

- Deprecate `DataFrame.as_dict` positional input (12131)

๐Ÿš€ Performance improvements

- Reduce compute in error message for failed datetime parsing (12147)

โœจ Enhancements

- tunable concurrency (12171)
- support reverse sort in streaming (12169)
- Add `.arr.to_list` expression (12136)
- Support decimals in assert utils (12119)
- add concurrency budget (12117)
- improved support for use of file-like objects with `DataFrame` "write" methods (12113)
- Introduce ignore\_nulls for str.concat (12108)

๐Ÿž Bug fixes

- fix incorrect desc sort behavior (12141)
- `take` should block predicate pushdown (12130)
- use null type when read from unknown row (12128)
- boundary predicate to block all accumulated predicates in push down (12105)
- make python `schema_overrides` information available to the rust-side inference code when initialising from records/dicts (12045)
- fix panic when initializing Series with array of list dtype (12148)
- Fix schema of arr.min/max (12127)
- ensure filter predicate inputs exist in schema (12089)

๐Ÿ› ๏ธ Other improvements

- pin ring (12176)
- Improve `strip_{prefix, suffix}` \& `strip_chars_{start, end}` (12161)
- Fix tests for pyarrow 14 (12170)
- Fix rendering of note in `DataFrame.fold` (12164)
- Fix triggers for docs deployment (12159)
- Refactor some tests (12121)
- Consolidate contributing info (12109)
- Fix typo in user-guide/expressions/plugins.md (12115)
- Render docstring text in single backticks as code (12096)
- use more ergonomic syntax in select/with\_columns where possible (12101)
- Update CODEOWNERS (12107)
- visualize plugin directory layout in user guide (12092)
- Minor tweak in code example in section Expressions/Aggregation (12033)
- Minor tweak in code example in section Expressions/Missing data (12080)
- Minor improvements to the docs website (12084)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Priyansh121096, alexander-beedie, dependabot, dependabot[bot], jrycw, moritzwilksch, nameexhaustion, reswqa, ritchie46, stefmolin and stinodego


py-0.19.12
โš ๏ธ Deprecations

- Deprecate `nans_compare_equal` parameter in assert utils (12019)
- Rename `ljust`/`rjust` to `pad_end`/`pad_start` (11975)
- Deprecate `shift_and_fill` in favor of `shift` (11955)
- Deprecate `clip_min`/`clip_max` in favor of `clip` (11961)

๐Ÿš€ Performance improvements

- improve parquet downloading (12061)
- fix regression non-null asof join (11984)
- drasticly improve performance of limit on async parquet datasets (11965)

โœจ Enhancements

- Add supertype for `List`/`Array` (12016)
- enable eq and neq for array dtype (12020)
- Expressify n of shift (12004)
- add dedicated `name` namespace for operations that affect expression names (11973)
- optimize asof\_join and allow null/string keys (11712)
- limit concurrent downloads in async parquet (11971)
- sample fraction can take an expr (11943)
- Add `infer_schema_length` to `pl.read_json` (11724)

๐Ÿž Bug fixes

- Fix `get_index`/iteration for `Array` types (12047)
- improved xlsx2csv defaults for `read_excel` (12081)
- str.concat on empty list (12066)
- fix issue with invalid `Mapping` objects used as schema being silently ignored (12027)
- improve ingest from `numpy` scalar values (12025)
- binary agg should group aware if literal not a scalar (12043)
- Use Arrow schema for file readers (12048)
- Error on duplicates in hive partitioning (12040)
- display fmt for str split (12039)
- sum\_horizontal should not always cast to int (12031)
- fix apply\_to\_inner's dtype (12010)
- Allow inexact checking of nested integers (12037)
- Fix padding for non-ASCII strings (12008)
- fix dot visualization of anonymous scans (12002)
- SQL table aliases (11988)
- fix streaming multi-column/multi-dtype sort (11981)
- ensure streaming parquet datasets deal with limits (11977)
- implement proper hash for identifier in cse (11960)
- fix take return dtype in group context. (11949)
- fix panic in format of anonymous scans (11951)
- sql In should work without specific ops (11947)
- construct list series from any values subject to dtype (11944)

๐Ÿ› ๏ธ Other improvements

- minor updates to lint-related dependencies (12073)
- Add Excel page to user guide (12055)
- Direct CONTRIBUTING to the docs website (12042)
- Replace `black` by `ruff format` (11996)
- Further assert utils refactor (12015)
- Remove stacklevels checker utility script (11962)
- Disable type checking for `dataframe_api_compat` dependency (11997)
- Fix release tag (11994)
- optimize asof\_join and allow null/string keys (11712)
- Add `Development` and `Releases` sections to the documentation (11932)
- include the "build" dir when running `make clean` for docs (11970)
- make cloning `PyExpr` consistent (11956)
- fix take return dtype in group context. (11949)
- warn about scan\_pyarrow\_dataset's limitations and suggest scan\_parquet instead (if possible) (11952)
- Add `set_fmt_table_cell_list_len` to API docs (11942)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Rohxn16, alexander-beedie, braaannigan, brayanjuls, messense, nameexhaustion, orlp, reswqa, ritchie46, squnit, stinodego and universalmind303


rs-0.34.0
๐Ÿ† Highlights

- postfix `rolling` expression as a special case of window functions. (11445)
- support 'hive partitioning' aware readers (11284)

๐Ÿ’ฅ Breaking changes

- Rename `.list.lengths` and `.str.lengths` (11613)
- Rename `write_csv` parameter `quote` to `quote_char` (11583)
- Add `disable_string_cache` (11020)

๐Ÿš€ Performance improvements

- fix regression non-null asof join (11984)
- drasticly improve performance of limit on async parquet datasets (11965)
- support multiple files in a single scan parquet node. (11922)
- fix accidental quadratic behavior; cache null\_count (11889)
- fix quadratic behavior in append sorted check (11893)
- properly push down slice before left/asof join (11854)
- Improve performance of `cot` (cotangent) (11717)
- rechunk before grouping on multiple keys (11711)
- process parquet statistics before downloading row-group (11709)
- push down predicates that refer to group\_by keys (11687)
- slightly faster float equality (11652)
- actually use projection information in async parquet reader (11637)
- improve performance and fix panic in async parquet reader (11607)
- use try\_binary\_elementwise over try\_binary\_elementwise\_values (11596)
- skip empty chunks in concat (11565)
- improve sparse sample performance (11544)
- early return in replace\_time\_zone if target and source time zones match (11478)
- greatly improve parquet cloud reading (11479)
- ensure we download row-groups concurrently. (11464)
- don't load N metadata files when globbing N files (11422)
- remove double memcopy (11365)
- adress perf regression (11354)
- improve dynamic\_groupby\_iter (11341)
- improve and fix rolling windows by linear scanning (11326)
- improve outer join materialization (11241)
- use ryu and itoa for primitive serialization (11193)
- use try-binary-elementwise instead of try-binary-elementwise-values in dt\_truncate (11189)
- Using cache for str.contains regex compilation (11183)

โœจ Enhancements

- optimize asof\_join and allow null/string keys (11712)
- limit concurrent downloads in async parquet (11971)
- sample fraction can take an expr (11943)
- Add `infer_schema_length` to `pl.read_json` (11724)
- improve error handling in scan\_parquet and deal with file limits (11938)
- support multiple files in a single scan parquet node. (11922)
- error instead of panic in unsupported sinks (11915)
- Introduce list.sample (11845)
- don't require empty config for cloud scan\_parquet (11819)
- Expressify pct\_change and move to ops (11786)
- add `DATE` function for SQL (11541)
- right-align numeric columns (7475)
- Add config setting to control how many List items are printed (11409)
- allow specifying schema in `pl.scan_ndjson` (10963)
- easier arrow2/arrow-rs conversion (11666)
- support multiple sources in scan\_file (11661)
- allow coalesce in streaming (11633)
- Implement `schema`, `schema_override` for `pl.read_json` with array-like input (11492)
- add SQL support for `UNION [ALL] BY NAME`, add "diagonal\_relaxed" strategy for `pl.concat` (11597)
- improve performance and fix panic in async parquet reader (11607)
- add time\_unit argument to duration, default to "us" (11586)
- elide overflow checks on i64 (11563)
- add `INITCAP` string function for SQL (9884)
- Use IPC for (un)pickling dataframes/series (11507)
- support left and right anti/semi joins from the SQL interface (11501)
- expressify peak\_min/peak\_max (11482)
- `IN(subquery)` and SQL Subquery Infrastructure (11218)
- Format null arrays in Series (11289)
- postfix `rolling` expression as a special case of window functions. (11445)
- allow for "by" column to be of dtype Date in rolling\_\* functions (11004)
- support 'abfss' for azure (11413)
- multi-threaded async runtime (11411)
- async parquet. (11403)
- fail fast when invalid cloud settings; introduce retries arg (11380)
- modernize CPU features (11351)
- introduce 'label' instead of 'truncate' in group\_by\_dynamic, which can take `label='right'` (11337)
- Expressify list.shift (11320)
- add gather\_skip\_nulls implementation (11329)
- top\_k and bottom\_k supports pass an expr (11344)
- support 'hive partitioning' aware readers (11284)
- str.strip\_chars supports take an expr argument (11313)
- sample n can take an expr (11257)
- Add `disable_string_cache` (11020)
- clip supports expr arguments and physical numeric dtype (11288)
- Introduce list.drop\_nulls (11272)
- str.splitn and split\_exact can take an expr argument by (11275)
- introduce ambiguous option for dt.round (11269)
- improve binary helper so we don't need to rechunk. (11247)
- Adds `NULLIF` and `COALESCE` SQL functions (11124)
- better `tree-formatting` representation (11176)
- Support `duration + date` (11190)
- binary search and rechunk in chunked gather (11199)
- Expressify str.strip\_prefix \& suffix (11197)
- sql udfs (10957)
- run cloud parquet reader in default engine (11196)
- list.join's separator can be expression (11167)
- argument every of datetime.truncate can be expression (11155)

๐Ÿž Bug fixes

- fix streaming multi-column/multi-dtype sort (11981)
- ensure streaming parquet datasets deal with limits (11977)
- implement proper hash for identifier in cse (11960)
- fix take return dtype in group context. (11949)
- sql In should work without specific ops (11947)
- construct list series from any values subject to dtype (11944)
- avoid integer overflow in offsets\_to\_groups when bigidx is enabled (11901)
- `read_csv` for empty lines (11924)
- predicate push-down remove predicate refers to alias for more branch (11887)
- use physcial append (11894)
- recursively apply `cast_unchecked` in lists (11884)
- recursively check allowed streaming dtypes (11879)
- fix project pushdown for double projection contains count (11843)
- series.to\_numpy fails with dtype=Null (11858)
- panic on hive scan from cloud (11847)
- Propagate validity when cast primitive to list (11846)
- Edge cases for list count formatting (11780)
- remove flag inconsistency 'map\_many' (11817)
- ensure projections containing only hive columns are projected (11803)
- patch broken aHash AES intrinsics on ARM (11801)
- fix key in object-store cache (11790)
- handle logical types in plugins (11788)
- make `PyLazyGroupby` reusable (11769)
- only exclude final output names of group\_by key expressions (11768)
- fix ambiguity wrt list aggregation states (11758)
- Correctly process subseconds in `pl.duration` (11748)
- LazyFrame.drop\_columns overflow issue when columns.len()>schema.len() (11716)
- index\_to\_chunked\_index's fast path is not correct (11710)
- use actual number of read rows for hive materialization (11690)
- return float dtype in interpolate (for method="linear") for numeric dtypes (11624)
- fix seg fault in concat\_str of empty series (11704)
- Fix match on last item for `join_asof` with `strategy="nearest"` (11673)
- fix display str for peak\_max and top\_k (11657)
- Fix input replacement logic for slice (11631)
- slice expr can be taken in cse (11628)
- ensure nested logical types are converted to physical (11621)
- correctly convert nullability of nested parquet fields to arrow (11619)
- improve performance and fix panic in async parquet reader (11607)
- expand all literals before group\_by (11590)
- mark take\_group\_last function as unsafe (11587)
- handle unary operators applied to numbers used in SQL `IN` clauses (11574)
- Align new\_columns argument for `scan_csv` and `read_csv` (11575)
- don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (11576)
- incomplete reading of list types from parquet (11578)
- respect identity in horizontal sum (11559)
- bug in BitMask::get\_u32 (11560)
- take slice into account in parallel unions (11558)
- correct schema empty df in hive partitioning read (11557)
- ensure ListChunked::full\_null uses physical types (11554)
- respect 'hive\_partitioning' argument in parquet (11551)
- fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (11549)
- streamline `is_in` handling of mismatched dtypes and fix a minor regression (11533)
- catch use of non equi-joins in SQL interface and raise appropriate error (11526)
- rework SQL join constraint processing to properly account for all `USING` columns (11518)
- literal hash (11508)
- Fix lazy schema for `cut`/`qcut` when `allow_breaks=True` (11287)
- correct output schema of hive partition and projection at scan (11499)
- correct projection pushdown in hive partitioned read (11486)
- fix for `write_csv` when using non-default "quote" char (11474)
- fix deserialization of parquets with large string list columns causing stack overflow (11471)
- Fix SQL `ANY` and `ALL` behaviour (10879)
- address multiple issues caused by implicit casting of `is_in` values to the column dtype being searched (11427)
- raise on invalid sort\_by group lengths (11423)
- fix outer join on bools (11417)
- fix categorical collect (11414)
- Free bitmap when slicing into a non-null array (11405)
- async parquet. (11403)
- Fix edge-case where the Array dtype could (internally) be considered numeric (11398)
- Fix empty check when building a list (11378)
- more cloud urls (11361)
- ensure cloud globbing can deal with spaces (11360)
- recognize more cloud urls (11357)
- Fix `Series.__contains__` for None values and implement `is_in` for null Series (11345)
- don't panic on multi-nodes in streaming conversion (11343)
- ensure trailing quote is written for temporal data when CSV `quote_style` is non-numeric (11328)
- fix empty Series construction edge-case with Struct dtype (11301)
- add missing feature flags on tests (11305)
- set partitions independent of thread pool (11304)
- parse sign for decimal properly (11302)
- consume duplicates in rolling\_by window (11261)
- handle url encoded paths in objectpath creation (11240)
- use POOL when writing csv (11222)
- is\_in for bool evaluate has\_false incorrectly (11217)
- fix nullable filter mask in group\_by (11207)
- replace n-th in filter (11206)
- fix translation of Series-nested datetime/date values for `scan_pyarrow` predicates (11195)
- impl hash for more function expr (11182)
- list.join's separator can be expression (11167)
- Add some missing expr type hint for series (11171)
- Make pl.struct serializable (11169)
- Fix rust test for logical plan optimizer for categoricals (11135)
- propagate null value for str/binary starts/ends\_with and contains (11141)

๐Ÿ› ๏ธ Other improvements

- optimize asof\_join and allow null/string keys (11712)
- Add `Development` and `Releases` sections to the documentation (11932)
- use ahash from crates.io release (11964)
- move unique\_counts to ops (11963)
- fix take return dtype in group context. (11949)
- move moment to ops (11941)
- fix some typos and add polars-business to curated plugin list (11916)
- prepare for multiple files in a node (11918)
- load 40x40 avatar from github and add loading=lazy attribute. (11886)
- Fix Cargo warning for parquet2 dependency (11882)
- Allow manual trigger for docs deployment (11881)
- rename new\_from\_owned\_with\_null\_bitmap (11828)
- add section about plugins (11855)
- fix incorrect example of valid time zones (11873)
- Bump docs dependencies (11852)
- add missing polars-ops tests to CI (11859)
- Update doc comments for with\_column to reflect that columns can be updated (11840)
- Move round to ops (11838)
- arrow: remove unused arithmetic code and remove doctests (11820)
- Move diff to polars-ops (11818)
- remove redundant if branch in nested parquet (11814)
- Move ewma to polars-ops (11794)
- Make some functions in dsl::mod non-anonymous (11799)
- Move cum\_agg to polars-ops (11770)
- more granular polars-ops imports (11760)
- Make all emw function expr non-anonymous (11638)
- clarify polars-arrow \<=> arrow2 license (11755)
- Version `polars-arrow` with the other crates (11738)
- fill missing fill\_null strategies (11751)
- Minor fix in code example in section Coming from Pandas (11745) (11745)
- Update group\_by\_dynamic example (11737)
- merge nano-arrow/polars-arrow (11719)
- Improving the documentation of the SQL expressions (11708)
- \*\_horizontal dependent on reduce\_expr to expression architecture (11685)
- update document of folds (11705)
- update rustc and fix future (11696)
- better align `help` command output following addition of some longer options (11681)
- sum\_horizontal to expression architecture (11659)
- Cleanup the match block for date inference (11677)
- Adding feature annotation (11671)
- add note about use of `polars-lts-cpu` for macOS x86-64/rosetta (11660)
- improve rank implementation, especially around nulls (11651)
- Bring cloud monikers in line with the ones in `is_cloud_url` (11629)
- Rename `.list.lengths` and `.str.lengths` (11613)
- Make backwardfill and forwardfill function expr non-anonymous (11630)
- Make all expr in dt namespace non-anonymous (11627)
- Fix changelog for language-specific breaking changes (11617)
- avoid nightly rust for case conversion (11610)
- Make value\_counts and unique\_counts function expr non-anonymous (11601)
- Make arg\_min(max), diff in list namespace non-anonymous (11602)
- Rename `write_csv` parameter `quote` to `quote_char` (11583)
- use a generic consistent total ordering, also for floats (11468)
- Move mode operation from core to ops crate (11543)
- fix lints (11555)
- use single threaded take under certain values size (11539)
- fix some features (11529)
- move (hor\_)str\_concat to polars-ops (11488)
- minor changes in peak-min/max (11491)
- align cloud url regex in rust and python (11481)
- move AnonymousScan into Scan node (11502)
- move `repeat_by` to polars-ops (11461)
- upgrade to nightly-10-02 (11460)
- Update contributing guide to include memory requirement (11458)
- remove unused order\_by attribute (11434)
- cleanup sort\_by expresion impl (11431)
- large windows runner for release (11370)
- Fix error message reference to `infer_schema_length` (11358)
- move rank to polars-ops (11349)
- unify display for namespaced function expr (11342)
- Fix some cargo manifest warnings (11327)
- Use `GITHUB_TOKEN` to get contributor information for docs (11321)
- Add `disable_string_cache` (11020)
- remove default auto-explode for map\_many\_private (11270)
- Add API links for Rust user guide examples (11294)
- update a few dependencies (11283)
- move scan helpers to separate module (11279)
- update sponsors (11271)
- bump chrono to 0.4.31 (11258)
- bind all remaining method in StringNameSpace to function expr (11229)
- Make some list function expr non-anonymous (11230)
- remove lz4\_flex feature (11253)
- remove unnecessary transmute (11250)
- move (almost) all join related code from polars-core to polars-ops. (11228)
- Mention the `performant` feature only once (11223)
- remove unneeded indirection (11233)
- remove unneeded mutex around object-store (11224)
- bind struct.rename\_fields to function expr (11215)
- fix un-compilable rust example in user guide. (11214)
- add various missing expression doc-comments (11213)
- Fix user\_guide of str.split (11185)
- New take implementation (11138)
- Fix rust test for logical plan optimizer for categoricals (11135)

Thank you to all our contributors for making this release possible!
ByteNybbler, Cheukting, Fokko, Hofer-Julian, JulianCologne, LaurynasMiksys, MarcoGorelli, Rohxn16, SeanTroyUWO, TheDataScientistNL, Walnut356, aberres, alexander-beedie, alicja-januszkiewicz, andysham, billylanchantin, bowlofeggs, c-peters, cmdlineluser, dannyvankooten, dependabot, dependabot[bot], ewoolsey, jhorstmann, jonashaag, jrycw, mcrumiller, messense, nameexhaustion, orlp, petrosbar, ptiza, rancomp, reswqa, ritchie46, rjthoen, romanovacca, sd2k, shenker, squnit, stinodego, svaningelgem, thomasjpfan, uchiiii, universalmind303 and Romano Vacca


py-0.19.12-rc.1
โš ๏ธ Deprecations

- Deprecate `shift_and_fill` in favor of `shift` (11955)
- Deprecate `clip_min`/`clip_max` in favor of `clip` (11961)

๐Ÿš€ Performance improvements

- fix regression non-null asof join (11984)
- drasticly improve performance of limit on async parquet datasets (11965)

โœจ Enhancements

- optimize asof\_join and allow null/string keys (11712)
- limit concurrent downloads in async parquet (11971)
- sample fraction can take an expr (11943)
- Add `infer_schema_length` to `pl.read_json` (11724)

๐Ÿž Bug fixes

- fix streaming multi-column/multi-dtype sort (11981)
- ensure streaming parquet datasets deal with limits (11977)
- implement proper hash for identifier in cse (11960)
- fix take return dtype in group context. (11949)
- fix panic in format of anonymous scans (11951)
- sql In should work without specific ops (11947)
- construct list series from any values subject to dtype (11944)

๐Ÿ› ๏ธ Other improvements

- optimize asof\_join and allow null/string keys (11712)
- Add `Development` and `Releases` sections to the documentation (11932)
- include the "build" dir when running `make clean` for docs (11970)
- make cloning `PyExpr` consistent (11956)
- fix take return dtype in group context. (11949)
- warn about scan\_pyarrow\_dataset's limitations and suggest scan\_parquet instead (if possible) (11952)
- Add `set_fmt_table_cell_list_len` to API docs (11942)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Rohxn16, alexander-beedie, messense, orlp, reswqa, ritchie46, squnit and stinodego


py-0.19.11
โš ๏ธ Deprecations

- Rename `shift` parameter from `periods` to `n` (11923)
- Fix `Array` data type initialization (11907)

๐Ÿš€ Performance improvements

- support multiple files in a single scan parquet node. (11922)

โœจ Enhancements

- improve error handling in scan\_parquet and deal with file limits (11938)
- support multiple files in a single scan parquet node. (11922)
- error instead of panic in unsupported sinks (11915)
- upcast int->float and date->datetime for certain Series comparisons (11779)

๐Ÿž Bug fixes

- avoid integer overflow in offsets\_to\_groups when bigidx is enabled (11901)
- `read_csv` for empty lines (11924)
- raise suitable error on invalid predicates passed to `filter` method (11928)
- Fix `Array` data type initialization (11907)
- set null\_count on categorical append (11914)
- predicate push-down remove predicate refers to alias for more branch (11887)
- address DataFrame construction error with lists of `numpy` arrays (11905)
- address issue with inadvertently shared options dict in `read_excel` (11908)
- raise a suitable error from `read_excel` and/or `read_ods` when target sheet does not exist (11906)

๐Ÿ› ๏ธ Other improvements

- Fix typo in `read_excel` docstring (11934)
- Fix docstring for `diff` methods (11921)
- fix some typos and add polars-business to curated plugin list (11916)
- add missing 'diagonal\_relaxed' to `pl.concat` "how" param docstring signature (11909)

Thank you to all our contributors for making this release possible!
LaurynasMiksys, alexander-beedie, mcrumiller, reswqa, ritchie46, romanovacca, shenker, stinodego and uchiiii


py-0.19.10
โš ๏ธ Deprecations

- Deprecate `DataType.is_nested` (11844)

๐Ÿš€ Performance improvements

- fix accidental quadratic behavior; cache null\_count (11889)
- fix quadratic behavior in append sorted check (11893)
- optimise `read_database` Databricks queries made using SQLAlchemy connections (11885)
- properly push down slice before left/asof join (11854)

โœจ Enhancements

- Introduce list.sample (11845)
- don't require empty config for cloud scan\_parquet (11819)

๐Ÿž Bug fixes

- use physical append (11894)
- Add `include_nulls` parameter to `update` (11830)
- recursively apply `cast_unchecked` in lists (11884)
- recursively check allowed streaming dtypes (11879)
- Frame slicing single column (11825)
- fix project pushdown for double projection contains count (11843)
- Propagate validity when cast primitive to list (11846)
- Edge cases for list count formatting (11780)

๐Ÿ› ๏ธ Other improvements

- Further assert utils refactor (11888)
- load 40x40 avatar from github and add loading=lazy attribute. (11886)
- Fix Cargo warning for parquet2 dependency (11882)
- Allow manual trigger for docs deployment (11881)
- add section about plugins (11855)
- fix incorrect example of valid time zones (11873)
- fix typo in code example in section Expressions - Basic operators (11848)
- Bump docs dependencies (11852)
- add missing polars-ops tests to CI (11859)
- Assert utils refactor (11813)

Thank you to all our contributors for making this release possible!
Walnut356, alexander-beedie, dannyvankooten, dependabot, dependabot[bot], ewoolsey, jrycw, mcrumiller, nameexhaustion, orlp, reswqa, ritchie46, rjthoen, romanovacca and stinodego


py-0.19.9
๐Ÿ† Highlights

- extend `filter` capabilities with new support for `*args` predicates, `**kwargs` constraints, and chained boolean masks (11740)

โš ๏ธ Deprecations

- Deprecate non-keyword args for `ewm` methods (11804)
- Deprecate `use_pyarrow` param for `Series.to_list` (11784)
- Rename `group_by_rolling` to `rolling` (11761)

๐Ÿš€ Performance improvements

- Improve `DataFrame.get_column` performance by ~35% (11783)
- rechunk before grouping on multiple keys (11711)
- process parquet statistics before downloading row-group (11709)
- push down predicates that refer to group\_by keys (11687)
- slightly faster float equality (11652)

โœจ Enhancements

- Expressify pct\_change and move to ops (11786)
- primitive kwargs in plugins (11268)
- add `DATE` function for SQL (11541)
- Add config setting to control how many List items are printed (11409)
- Use `OrderedDict` for schemas (11742)
- allow specifying schema in `pl.scan_ndjson` (10963)
- add support for "outer" mode to frame `update` method (11688)
- transparently support "qmark" parameterisation of SQLAlchemy queries in `read_database` (11700)
- support multiple sources in scan\_file (11661)
- support batched frame iteration over `read_database` queries (11664)
- column selector support for `DataFrame.melt` and `LazyFrame.unnest` (11662)

๐Ÿž Bug fixes

- ensure projections containing only hive columns are projected (11803)
- patch broken aHash AES intrinsics on ARM (11801)
- fix key in object-store cache (11790)
- handle logical types in plugins (11788)
- Fix values printed by `assert_*_equal` AssertionError when `exact=False` (11781)
- make `PyLazyGroupby` reusable (11769)
- only exclude final output names of group\_by key expressions (11768)
- Fix subsecond parsing in timedelta conversions (11759)
- fix ambiguity wrt list aggregation states (11758)
- Correctly process subseconds in `pl.duration` (11748)
- use actual number of read rows for hive materialization (11690)
- return float dtype in interpolate (for method="linear") for numeric dtypes (11624)
- fix seg fault in concat\_str of empty series (11704)
- fix sort\_by regression (11679)
- Fix match on last item for `join_asof` with `strategy="nearest"` (11673)

๐Ÿ› ๏ธ Other improvements

- Bump lint dependencies (11802)
- Minor updates to assertion utils and docstrings (11798)
- Remove unused `_to_rust_syntax` util (11795)
- Minor tweak in code example in section Coming from Pandas (11764)
- Fix Exception module paths (11785)
- Rename `IntegralType` to `IntegerType` (11773)
- more granular polars-ops imports (11760)
- Link to `expand_selector` in user guide (11722)
- Add parametric test for `df.to_dict`/`series.to_list` (11757)
- Minor fix in code example in section Coming from Pandas (11745) (11745)
- Move tests for `group_by_dynamic` into one module (11741)
- Update group\_by\_dynamic example (11737)
- reorder pl.duration arguments (11641)
- remove default features from some crates (11680)
- \*\_horizontal dependent on reduce\_expr to expression architecture (11685)
- clarify that median is equivalent to the 50% percentile shown in `describe` metrics (11694)
- update rustc and fix future (11696)
- Publish release after uploading assets (11686)
- upgrade pyo3 to 0.20.0 (11683)
- better align `help` command output following addition of some longer options (11681)
- sum\_horizontal to expression architecture (11659)
- add note about use of `polars-lts-cpu` for macOS x86-64/rosetta (11660)
- improve rank implementation, especially around nulls (11651)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Walnut356, aberres, alexander-beedie, alicja-januszkiewicz, cmdlineluser, jrycw, mcrumiller, messense, nameexhaustion, orlp, petrosbar, rancomp, reswqa, ritchie46, romanovacca, sd2k, stinodego, svaningelgem and thomasjpfan


py-0.19.8
๐Ÿ† Highlights

- Enable additional flags for x86-64 wheels (11487)

โš ๏ธ Deprecations

- Rename `.list.lengths` and `.str.lengths` (11613)
- Deprecate default value for `radix` in `parse_int` (11615)
- Rename `write_csv` parameter `quote` to `quote_char` (11583)

๐Ÿš€ Performance improvements

- actually use projection information in async parquet reader (11637)
- improve performance and fix panic in async parquet reader (11607)
- use try\_binary\_elementwise over try\_binary\_elementwise\_values (11596)
- skip empty chunks in concat (11565)
- improve sparse sample performance (11544)

โœจ Enhancements

- Standardize error message format (11598)
- allow coalesce in streaming (11633)
- Implement `schema`, `schema_override` for `pl.read_json` with array-like input (11492)
- add SQL support for `UNION [ALL] BY NAME`, add "diagonal\_relaxed" strategy for `pl.concat` (11597)
- improve performance and fix panic in async parquet reader (11607)
- add time\_unit argument to duration, default to "us" (11586)
- support `read_database` options passthrough to the underlying connection's `execute` method (enables parameterised SQL queries, etc) (11562)
- elide overflow checks on i64 (11563)
- add `INITCAP` string function for SQL (9884)

๐Ÿž Bug fixes

- Fix input replacement logic for slice (11631)
- slice expr can be taken in cse (11628)
- ensure nested logical types are converted to physical (11621)
- correctly convert nullability of nested parquet fields to arrow (11619)
- improve performance and fix panic in async parquet reader (11607)
- normalize filepath in sink\_parquet (11605)
- parse time unit properly in pl.lit (11573)
- expand all literals before group\_by (11590)
- fix as\_dict with include\_key=False for partition\_by (9865)
- mark take\_group\_last function as unsafe (11587)
- handle unary operators applied to numbers used in SQL `IN` clauses (11574)
- Align new\_columns argument for `scan_csv` and `read_csv` (11575)
- Add initialization support for python Timedeltas (11566)
- incomplete reading of list types from parquet (11578)
- respect identity in horizontal sum (11559)
- bug in BitMask::get\_u32 (11560)
- take slice into account in parallel unions (11558)
- correct schema empty df in hive partitioning read (11557)
- ensure ListChunked::full\_null uses physical types (11554)
- respect 'hive\_partitioning' argument in parquet (11551)
- fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (11549)
- streamline `is_in` handling of mismatched dtypes and fix a minor regression (11533)
- fix comparing tz-aware series with stdlib datetime (11480)
- catch use of non equi-joins in SQL interface and raise appropriate error (11526)
- rework SQL join constraint processing to properly account for all `USING` columns (11518)

๐Ÿ› ๏ธ Other improvements

- Improved user guide for cloud functionality (11646)
- Improve some docstrings (11644)
- Disable clippy lint "too many arguments" for `py-polars` (11616)
- Make backwardfill and forwardfill function expr non-anonymous (11630)
- Make all expr in dt namespace non-anonymous (11627)
- Fix changelog for language-specific breaking changes (11617)
- Make value\_counts and unique\_counts function expr non-anonymous (11601)
- Make arg\_min(max), diff in list namespace non-anonymous (11602)
- Rename `write_csv` parameter `quote` to `quote_char` (11583)
- improve struct documentation (11585)
- Remove `**kwargs` from `LazyFrame.collect()` (11567)
- use a generic consistent total ordering, also for floats (11468)
- fix lints (11555)
- Remove toolchain specification workaround (11552)
- Trigger Python release from Actions workflow dispatch (11538)
- Enable additional flags for x86-64 wheels (11487)

Thank you to all our contributors for making this release possible!
ByteNybbler, MarcoGorelli, TheDataScientistNL, alexander-beedie, andysham, c-peters, jhorstmann, mcrumiller, nameexhaustion, orlp, reswqa, ritchie46, romanovacca, stinodego and svaningelgem


py-0.19.7
๐Ÿ† Highlights

- Postfix `rolling` expression as a special case of window functions. (11445)
- Use IPC for (un)pickling dataframes/series (11507)

๐Ÿš€ Performance improvements

- early return in replace\_time\_zone if target and source time zones match (11478)
- greatly improve parquet cloud reading (11479)
- ensure we download row-groups concurrently. (11464)

โœจ Enhancements

- support left and right anti/semi joins from the SQL interface (11501)
- Add `left_on` and `right_on` parameters to `df.update` (11277)
- expressify peak\_min/peak\_max (11482)
- `IN(subquery)` and SQL Subquery Infrastructure (11218)
- add ODBC connection string support to `read_database` (11448)
- postfix `rolling` expression as a special case of window functions. (11445)
- allow for "by" column to be of dtype Date in rolling\_\* functions (11004)
- rework `ColumnFactory` to additionally support tab-complete for `col` in IPython (11435)

๐Ÿž Bug fixes

- literal hash (11508)
- Fix lazy schema for `cut`/`qcut` when `allow_breaks=True` (11287)
- correct output schema of hive partition and projection at scan (11499)
- correct projection pushdown in hive partitioned read (11486)
- fix for `write_csv` when using non-default "quote" char (11474)
- fix deserialization of parquets with large string list columns causing stack overflow (11471)
- enable `read_database` fallback for Snowflake warehouses/connections that don't support Arrow resultsets (11447)
- Fix SQL `ANY` and `ALL` behaviour (10879)
- partially address some PyCharm tooltip/signature issues with decorated methods (11428)
- address multiple issues caused by implicit casting of `is_in` values to the column dtype being searched (11427)

๐Ÿ› ๏ธ Other improvements

- minor changes in peak-min/max (11491)
- align cloud url regex in rust and python (11481)
- Test sdist before releasing (11494)
- Unpin maturin version, fix release workflow (11483)
- More release workflow refactor (11472)
- Set some env vars for release (11463)
- move `repeat_by` to polars-ops (11461)
- upgrade to nightly-10-02 (11460)
- Update contributing guide to include memory requirement (11458)
- add missing docs entry for rolling (11456)
- use with\_columns in shift examples (11453)
- Add wheels as assets to GitHub release (11452)
- Build more wheels for `polars-lts-cpu`/`polars-u64-idx` (11430)

Thank you to all our contributors for making this release possible!
ByteNybbler, MarcoGorelli, SeanTroyUWO, alexander-beedie, c-peters, dependabot, dependabot[bot], mcrumiller, orlp, ritchie46, romanovacca, stinodego, svaningelgem and Romano Vacca


py-0.19.6
๐Ÿš€ Performance improvements

- don't load N metadata files when globbing N files (11422)

๐Ÿž Bug fixes

- raise on invalid sort\_by group lengths (11423)
- fix outer join on bools (11417)
- fix categorical collect (11414)
- fix opaque python reader schema (11412)
- async parquet. (11403)
- Fix edge-case where the Array dtype could (internally) be considered numeric (11398)
- handle ambiguous datetimes in pl.lit (11386)
- fix panic in hive read of booleans (11376)

๐Ÿ› ๏ธ Other improvements

- Split Python release into build / release jobs (11421)
- Refactor Python release workflow (11382)
- clarify use of "batch\_size" for `read_database` (11377)
- large windows runner for release (11370)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, bowlofeggs, c-peters, jonashaag, orlp, ritchie46 and stinodego


py-0.19.5
๐Ÿš€ Performance improvements

- remove double memcopy (11365)
- adress perf regression (11354)

๐Ÿž Bug fixes

- revert invalid runtime check (11363)
- more cloud urls (11361)
- ensure cloud globbing can deal with spaces (11360)
- recognize more cloud urls (11357)

๐Ÿ› ๏ธ Other improvements

- Disable version warning banner for now (11359)
- Fix error message reference to `infer_schema_length` (11358)
- Mark some tests as slow (11350)
- improve parametric tests for group\_by\_rolling by skipping overflowing cases (11286)

Thank you to all our contributors for making this release possible!
MarcoGorelli, jonashaag, orlp, ritchie46 and stinodego


py-0.19.4
๐Ÿ† Highlights

- support 'hive partitioning' aware readers (11284)
- natively support reading parquet for aws, gcp and azure (11210)
- Add support for Iceberg (10375)
- The great expressification by reswqa (11320, 11344, 11313, 11257, 11288, 11275, 11197, 11167, 11155)

โš ๏ธ Deprecations

- Add `disable_string_cache` (11020)

๐Ÿš€ Performance improvements

- improve dynamic\_groupby\_iter (11341)
- improve and fix rolling windows by linear scanning (11326)
- faster init from `pydantic` models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (11263)
- improve outer join materialization (11241)
- use ryu and itoa for primitive serialization (11193)
- use try-binary-elementwise instead of try-binary-elementwise-values in dt\_truncate (11189)
- Using cache for str.contains regex compilation (11183)

โœจ Enhancements

- introduce 'label' instead of 'truncate' in group\_by\_dynamic, which can take `label='right'` (11337)
- Expressify list.shift (11320)
- top\_k and bottom\_k supports pass an expr (11344)
- add "pyxlsb" engine support to `read_excel` (for excel binary workbook files) (11248)
- support 'hive partitioning' aware readers (11284)
- str.strip\_chars supports take an expr argument (11313)
- sample n can take an expr (11257)
- Add `disable_string_cache` (11020)
- clip supports expr arguments and physical numeric dtype (11288)
- Introduce list.drop\_nulls (11272)
- str.splitn and split\_exact can take an expr argument by (11275)
- introduce ambiguous option for dt.round (11269)
- Adds `NULLIF` and `COALESCE` SQL functions (11124)
- better `tree-formatting` representation (11176)
- natively support reading parquet for aws, gcp and azure (11210)
- Expressify str.strip\_prefix \& suffix (11197)
- Add support for Iceberg (10375)
- list.join's separator can be expression (11167)
- argument every of datetime.truncate can be expression (11155)

๐Ÿž Bug fixes

- Fix `Series.__contains__` for None values and implement `is_in` for null Series (11345)
- don't panic on multi-nodes in streaming conversion (11343)
- ensure trailing quote is written for temporal data when CSV `quote_style` is non-numeric (11328)
- clarify `has_validity` docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of `null` values (11319)
- fix empty Series construction edge-case with Struct dtype (11301)
- DataFrame init from `collections.namedtuple` values (11314)
- Exclude functools wrapper frames in `find_stacklevel` (11292)
- set partitions independent of thread pool (11304)
- address VSCode issue with autocomplete on `selector` expressions in editor/console (11235)
- consume duplicates in rolling\_by window (11261)
- handle url encoded paths in objectpath creation (11240)
- use POOL when writing csv (11222)
- don't conflate saved `Config` JSON string with file path (11098)
- is\_in for bool evaluate has\_false incorrectly (11217)
- improve handling of database drivers that can return arrow data (11201)
- fix nullable filter mask in group\_by (11207)
- replace n-th in filter (11206)
- fix translation of Series-nested datetime/date values for `scan_pyarrow` predicates (11195)
- address unexpected expression name from use of unary `-` or `+` operators (11158)
- impl hash for more function expr (11182)
- list.join's separator can be expression (11167)
- Add some missing expr type hint for series (11171)
- consistently use negative every as the default for offset in group\_by\_dynamic (11164)
- Make pl.struct serializable (11169)
- only raise on actual parameter collision when "dtypes" specified in `read_excel` "read\_csv\_options" (11162)
- propagate null value for str/binary starts/ends\_with and contains (11141)

๐Ÿ› ๏ธ Other improvements

- simplify/clarify group\_by\_dynamic examples (11335)
- tighten `assert_frame_equal` for LazyFrames (don't collect until after the schema has been checked) (11331)
- unify display for namespaced function expr (11342)
- add lazy pivot example (11325)
- Use `GITHUB_TOKEN` to get contributor information for docs (11321)
- Enable version warning banner (11322)
- cross-reference `null_count` from `has_validity` (clarifies the correct way to check for nulls) (11323)
- Pin pydantic in dev requirements `<2.4.0` (11312)
- remove default auto-explode for map\_many\_private (11270)
- Add type alias `IntoExprColumn` (11296)
- update a few dependencies (11283)
- Properly skip ADBC test (11282)
- Fix some minor Makefile issues (11276)
- update sponsors (11271)
- parametric tests for group\_by\_rolling (11262)
- Make some list function expr non-anonymous (11230)
- Mention the `performant` feature only once (11223)
- remove unneeded indirection (11233)
- remove unneeded mutex around object-store (11224)
- clarify every/period/offset in group\_by\_dynamic (11175)
- Fix `read_database` `batch_size` docstring (11132)

Thank you to all our contributors for making this release possible!
ByteNybbler, Cheukting, Fokko, Hofer-Julian, MarcoGorelli, SeanTroyUWO, alexander-beedie, billylanchantin, jonashaag, mcrumiller, orlp, ptiza, reswqa, ritchie46, stinodego and universalmind303


rs-0.33.0
๐Ÿ† Highlights

- implementing sink\_csv for LazyFrame (10682)

๐Ÿ’ฅ Breaking changes

- empty product returns identity (10842)
- return `f64` for `rank` when `method="average"` (10734)
- Rename `groupby` to `group_by` (10654)
- Read/write support for IPC streams in DataFrames (10606)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- remove fixed\_seed and add pl.set\_random\_seed (10388)
- Make `arange` an alias for `int_range` (9983)
- `date_range`/`time_range` no longer return a `List` type (10526)
- Remove various functionalities deprecated before `0.18` (10527)

โš ๏ธ Deprecations

- Rename `is_first/last` to `is_first/last_distinct` (11130)
- Rename `count_match` to `count_matches` (11028)
- Rename `strip` to `strip_chars` (10813)
- Add `datetime_range` expression function (10213)
- Rename `Series/Expr.rolling_apply` to `rolling_map` (10750)

๐Ÿš€ Performance improvements

- improve performance of fast projection (10945)
- parse time zones outside of downcast\_iter() in replace\_time\_zone (10713)
- use binary abstraction for atan2 (10588)
- use binary abstraction in pow (10562)

โœจ Enhancements

- Expressify str.split argument. (11117)
- Expressify argument of binary contains (11091)
- dt.offset\_by supports broadcasting lhs (11095)
- Expressify argument of binary starts\_with and ends\_with (11076)
- json\_extract supports extract static and string value to list dtype (11057)
- add quote\_style="never" option for `write_csv` (11015)
- add support for nextest (11048)
- Add `literal` for str count\_match (10996)
- More dtypes supports cast to list (11025)
- ParquetCloudSink to allow streaming pipelines into remote ObjectStores (10060)
- Add `strip_prefix` and `strip_suffix` to the string namespace (10958)
- Add `datetime_range` expression function (10213)
- add proper cache for Regex compilation (10934)
- implementation of `array_to_string` (10839)
- apply left side predicate pushdown also to right side if all predicate columns are also join columns (10841)
- accept expr in `str.count_match` (10900)
- accept expressions in `.offset_by` (9967)
- implement drop as special case of `select` (10885)
- Supports is\_last operation (10760)
- activate cse for group\_by (again) (10749)
- add pairwise float sum implementation (10756)
- implementing sink\_csv for LazyFrame (10682)
- Supports series unique \& arg\_unique \& n\_unique for list (10743)
- repeat\_by should also support broadcasting of LHS (10735)
- deprecate 'use\_earliest' argument in favour of 'ambiguous', which can take expressions (10719)
- is\_first also supports numeric list type. (10727)
- improve slice pushdown in unions (10723)
- Support min and max strategy for binary \& str columns fill null (10673)
- support broadcasting in list set operations (10668)
- add `truncate_ragged_lines` (10660)
- supports cast to list (10623)
- Rename `groupby` to `group_by` (10654)
- preserve whitespace in notebook output (10644)
- Read/write support for IPC streams in DataFrames (10606)
- improve binary (arity) generics (10622)
- propagate null is in `is_in` and more generic array construction (10614)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- frame-level `cast` support (10504)
- Add failed column to cast exception (10507)
- Make `arange` an alias for `int_range` (9983)
- `date_range`/`time_range` no longer return a `List` type (10526)
- Remove various functionalities deprecated before `0.18` (10527)

๐Ÿž Bug fixes

- Correct hash and fmt for struct expr (11119)
- enforce sortedness of by argument in rolling\_\* functions (11002)
- Filter on empty objectChunked should not throw error (11073)
- ensure null\_count statistics accounts for null array (11070)
- toggle off cse if ext\_context is used (11051)
- Correct field dtype of string concat (11055)
- pushed-down expr should be considered when evaluating ExternalContext (11023)
- fix rolling\_\* functions when "by" has nanosecond resolution (11005)
- Don't reuse member for Selector::Add (11026)
- fix the construction of List\<Null> (10969)
- allow singular null in regex pattern (10948)
- compute length of null array in explode (10946)
- Allow exactly one value in start/end for `int_range` (10914)
- count was falsy tagged as cse in group by (10917)
- Retain original dtype when deserializing an empty list (10893)
- CSE don't accept opaque functions (10905)
- Make `int_range(s)` exclusive on the upper bound when step is negative (10898)
- fix conversion from decimal to float (10776)
- Add broadcasting for list comparisons (10857)
- don't overflow length before checking limit (10883)
- fix bug where datetimes were not parsed in read\_csv when pattern had no hour or minute (10877)
- tag amortized iter unsafe and add safe alternatives (10881)
- use pool in dataframe arithmetic (10864)
- remove debug `println!` from datetime fn (10862)
- repair polars\_err string interpolation (10863)
- make count\_match docs and extract\_all docs/impl consistent around zero matches (10854)
- empty product returns identity (10842)
- never panic in hash/equality doesn't hold in cse (10836)
- Improve bound checks on temporal ranges (10837)
- var/std behavior around few elements (10828)
- Fix divided by zero error when read empty csv in streaming mode (10819)
- fix equality of quantile aggregation node (10816)
- Reading an only-header csv file in streaming mode should not panic (10810)
- get\_single\_leaf can't handle Expr::Count (10790)
- string to decimal parsing (10712)
- support groupby literal in streaming (10771)
- `ORDER BY` on unselected columns (10752)
- Fix is\_in cannot cast list type for float (10769)
- fix unicode truncation in json parsing (10761)
- Error message of list unique should not display inner type (10748)
- create `chunks_mut` entry in vtable (10745)
- Prevent panic on sample\_n with replacement from empty df (10731)
- only preserve sortedness flag in replace\_time\_zone when safe (10738)
- Error on `value_counts` on column named `"counts"` (10737)
- Build Series from empty Series vector (10558)
- return `f64` for `rank` when `method="average"` (10734)
- Keep min/max and arg\_min/arg\_max consistent. (10716)
- Fix bug when providing custom labels and opting for duplicates in qcut (10686)
- Cast small int type when scan csv in streaming mode. (10679)
- Reused input series in rolling\_apply should not be orderly (10694)
- re-sort buffer when update window swap the whole buffer (10696)
- Set the correct fast\_explode flag for ListUtf8ChunkedBuilder (10684)
- Sorted Utf8Chunked max\_str and min\_str should consider null value (10675)
- `AllHorizontal` format string (10658)
- List\<null> chunked builder should take care of series name (10642)
- respect 'ignore\_errors=False' in csv parser (10641)
- fix rename + projection pushdown (10624)
- fix int/float downcast in `is_in` (10620)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- Fix serialization for categorical chunked. (10609)
- join\_asof missing `tolerance` implementation, address edge-cases (10482)
- Take input\_schema to create physical expr for Selection (10571)
- fix serialization of empty lists (10563)
- Clear window cache after evaluate predication expr (10505)
- Parsing regex col in Expr::Columns (10551)
- sanitize column naming in boolean ops (10531)
- fix build for wasm (10536)
- remove fixed\_seed and add pl.set\_random\_seed (10388)
- fix build for wasm (9502)
- rollback cse in groupby: python 0.18.15 (10491)

๐Ÿ› ๏ธ Other improvements

- Removed duplicated example (11109)
- Add CODEOWNERS for docs folder (11107)
- Refactor starts\_with and ends\_with for string (11085)
- Integrate user guide (11089)
- remove feature gate join/groupby in polars-core (10965)
- Add Documentation issue type (11042)
- complete intra-docs in api documentation (11007)
- genericize take implementation (10976)
- genericize PolarsDataType (10952)
- enhance internal crates readme with reference to main crate (10928)
- Add `Duration` method for checking full days (10850)
- apply with\_name in more places (10899)
- never compare opaque functions (10906)
- eliminate repetition in utf8 datetime functions (10860)
- Fix issue templates for bug reports (10896)
- remove `LocalProjection` (10886)
- request verbose logging output of minimal reproducable examples (10882)
- Reorganize `range` expression module (10871)
- introduce with\_name for Series/ChunkedArray (10859)
- Further refactor temporal range functions (10844)
- Refactor `range` related functions (10830)
- Fix the un-compile Black box function parts in polars lazy cookbook (10809)
- Fix some broken links / formatting (10772)
- Improve docs for `polars-lazy` (10729)
- update rustc nightly\_2023-08-26 (10467)
- default to rust native flate2 lib (10733)
- Clear GitHub Actions caches weekly (10715)
- move 'is\_in' to polars-ops (10645)
- Clean up schema calculation for `date_range` (10653)
- remove unused apply functions and add fallible generic apply functions (10621)
- Enforce up-to-date `Cargo.lock` (10555)
- make binary chunkedarray functions DRY (10607)
- bump MSRV to 1.65 (10568)
- genericize chunk implementation (10506)
- use ChunkArray::(try\_)from\_chunk\_iter (10497)
- add VSCode rust-analyzer settings (10498)
- Update URLs for dev documentation (10495)
- Update features for latest `flate2` release (10492)

Thank you to all our contributors for making this release possible!
Barsik-sus, I8dNLo, JulianCologne, KacpiW, MarcoGorelli, Object905, OndrejSlamecka, Qqwy, SeanTroyUWO, TNieuwdorp, VasanthakumarV, alexander-beedie, aminalaee, antoniocali, braaannigan, bvanelli, c-peters, cjackal, cmdlineluser, dependabot, dependabot[bot], drgif, henrikig, ion-elgreco, jakob-keller, jeroenjanssens, jonashaag, lorepozo, marki259, mcrumiller, messense, mrogowski11, nameexhaustion, orlp, owrior, rben01, reswqa, ritchie46, s-banach, sdamashek, stinodego, svaningelgem, thomasjpfan, titoeb, trueb2, washcycle, wdoppenberg and zundertj


py-0.19.3
๐Ÿ† Highlights

- Polars plugins (10924)

โš ๏ธ Deprecations

- Rename `is_first/last` to `is_first/last_distinct` (11130)
- Rename `count_match` to `count_matches` (11028)
- Rename `strip` to `strip_chars` (10813)
- Add `datetime_range` expression function (10213)

๐Ÿš€ Performance improvements

- optimize `_unpack_schema()` (11080)
- optimize `polars.utils._post_apply_columns()` (11086)
- optimize `polars.utils._post_apply_columns()` (11041)
- optimize `_unpack_schema()` (10960)
- improve performance of fast projection (10945)

โœจ Enhancements

- Expressify str.split argument. (11117)
- Polars plugins (10924)
- better async\_collect (10912)
- Expressify argument of binary contains (11091)
- dt.offset\_by supports broadcasting lhs (11095)
- Expressify argument of binary starts\_with and ends\_with (11076)
- add OpenOffice spreadsheet support via new `pl.read_ods` function (11011)
- json\_extract supports extract static and string value to list dtype (11057)
- add quote\_style="never" option for `write_csv` (11015)
- Add `literal` for str count\_match (10996)
- More dtypes supports cast to list (11025)
- Add `strip_prefix` and `strip_suffix` to the string namespace (10958)
- improve `read_excel` table data identification (10953)
- Add `from_dataframe` fast path and improve typing (10979)
- add `openpyxl` as a new/optional engine for `read_excel` (6183)
- Add `datetime_range` expression function (10213)

๐Ÿž Bug fixes

- Correct hash and fmt for struct expr (11119)
- enforce sortedness of by argument in rolling\_\* functions (11002)
- Make `Series.__getitem__` raise an IndexError (11061)
- Filter on empty objectChunked should not throw error (11073)
- ensure null\_count statistics accounts for null array (11070)
- toggle off cse if ext\_context is used (11051)
- Correct field dtype of string concat (11055)
- fix partial schema init with `read_dicts` and reduce latency of small-frame creation (11047)
- pushed-down expr should be considered when evaluating ExternalContext (11023)
- fix rolling\_\* functions when "by" has nanosecond resolution (11005)
- Don't reuse member for Selector::Add (11026)
- ensure `series_equal` properly accounts for dtypes when strict=True (11012)
- fix the construction of List\<Null> (10969)
- write\_excel "hidden\_columns" parameter fails when taking a selector (10987)
- allow singular null in regex pattern (10948)
- compute length of null array in explode (10946)

๐Ÿ› ๏ธ Other improvements

- remove low contrast coloring from visited links (11133)
- Ignore matplotlib warning (11129)
- Do not run user guide examples by default (11128)
- Ignore matplotlib mypy warnings (11126)
- Add deprecation message in groupby docs (11121)
- Removed duplicated example (11109)
- Add CODEOWNERS for docs folder (11107)
- Refactor starts\_with and ends\_with for string (11085)
- Integrate user guide (11089)
- remove mentions of the deprecated random module (11087)
- simplify `SchemaDefinition` type alias (11077)
- put `fetch` explanation in a "notes" block to better highlight it in the docs (11058)
- remove feature gate join/groupby in polars-core (10965)
- Add Documentation issue type (11042)
- warn that "by" argument must be sorted for results to be correct in rolling\_\* functions (11013)
- Adds missing method refs in LazyDataFrame API docs (11027)
- Add lint for boolean trap (11010)
- Add private LazyFrame method for setting sink optimizations (10988)
- Enable a few more ruff lints (10998)
- document polars string duration language in temporal range functions (10978)
- Additional tests for interchange `get_data_buffer` (10966)
- genericize PolarsDataType (10952)
- Document that filter, drop\_nulls, left join preserve order (10955)
- add note about adbc flight sql driver (10949)
- Revert `pydantic >= 2.0.0` requirement (10944)
- note that pl.duration represents fixed durations, point to offset\_by for non-fixed (10927)
- Test S3 functionality using moto server (10164)

Thank you to all our contributors for making this release possible!
I8dNLo, KacpiW, MarcoGorelli, Object905, Qqwy, TNieuwdorp, alexander-beedie, antoniocali, bvanelli, cjackal, henrikig, jakob-keller, mrogowski11, nameexhaustion, orlp, reswqa, ritchie46, s-banach, stinodego, svaningelgem and thomasjpfan


py-0.19.2
๐Ÿ† Highlights

- Add syntactic sugar for `col("foo")` -> `col.foo` (10874)

โš ๏ธ Deprecations

- Rename `Expr.is_not()` to `not_()` (10838)

โœจ Enhancements

- allow individual `Config` options to be easily reset to their default value (10922)
- accept expr in `str.count_match` (10900)
- allow additional `glimpse` customisation, fix strings repr (10895)
- accept expressions in `.offset_by` (9967)
- support schema overrides for frames created from databases (10884)
- Add syntactic sugar for `col("foo")` -> `col.foo` (10874)
- support negative indexing in set\_at\_idx (10891)
- implement drop as special case of `select` (10885)
- raise a more helpful error when non-query statements passed to `read_database` (10851)

๐Ÿž Bug fixes

- Allow exactly one value in start/end for `int_range` (10914)
- fix(rust, python): raise error when function didn't receive any inputs (8635)
- count was falsy tagged as cse in group by (10917)
- CSE don't accept opaque functions (10905)
- Make `int_range(s)` exclusive on the upper bound when step is negative (10898)
- don't overflow length before checking limit (10883)
- fix bug where datetimes were not parsed in read\_csv when pattern had no hour or minute (10877)
- use pool in dataframe arithmetic (10864)
- repair polars\_err string interpolation (10863)
- make count\_match docs and extract\_all docs/impl consistent around zero matches (10854)

๐Ÿ› ๏ธ Other improvements

- Set minimum version for pydantic to `2.0.0` (10923)
- fix and clarify docs for `Expr.map_elements` (10647)
- fix rendering of bullet points in dt.round (10911)
- add test for 10875 (10913)
- apply with\_name in more places (10899)
- never compare opaque functions (10906)
- eliminate repetition in utf8 datetime functions (10860)
- Fix issue templates for bug reports (10896)
- request verbose logging output of minimal reproducable examples (10882)
- add a note about `read_database` connection/cursor behaviour (10873)
- introduce with\_name for Series/ChunkedArray (10859)

Thank you to all our contributors for making this release possible!
Barsik-sus, MarcoGorelli, alexander-beedie, c-peters, cmdlineluser, dependabot, dependabot[bot], drgif, jeroenjanssens, orlp, ritchie46, stinodego and wdoppenberg


py-0.19.1
๐Ÿ’ฅ Breaking changes
- empty product returns identity and product ignores nulls (10842)

โœจ Enhancements

- add `binary`, `boolean`, `categorical`, `date`, `object`, and `time` selectors (10806)
- Supports is\_last operation (10760)
- minor typing improvement for DataFrame.\_\_iter\_\_ (10825)
- Add custom error for `allow_copy=False` (10822)

๐Ÿž Bug fixes

- empty product returns identity (10842)
- never panic in hash/equality doesn't hold in cse (10836)
- Improve bound checks on temporal ranges (10837)
- var/std behavior around few elements (10828)
- Fix divided by zero error when read empty csv in streaming mode (10819)
- behaviour of `reversed(df)` (10823)
- fix equality of quantile aggregation node (10816)
- Reading an only-header csv file in streaming mode should not panic (10810)

๐Ÿ› ๏ธ Other improvements

- Refactor `range` related functions (10830)
- map-related docstring updates (10779)
- Move sink tests to streaming module (10821)

Thank you to all our contributors for making this release possible!
alexander-beedie, orlp, reswqa, ritchie46 and stinodego


py-0.19.0
An upgrade guide is available [on our website](https://www.pola.rs/posts/polars-0-19-upgrade-guide/).

๐Ÿ† Highlights

- implementing sink\_csv for LazyFrame (10682)
- Support `DataFrame` init from queries against users' existing database connections (10649)
- Rename `groupby` to `group_by` (10656)

๐Ÿ’ฅ Breaking changes

- return `f64` for `rank` when `method="average"` (10734)
- Update a lot of error types (10637)
- Remove deprecated behavior from vertical aggregations (10602)
- Read/write support for IPC streams in DataFrames (10606)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- Improve consistency of parsing expression input (9512)
- allow `from_arrow` to take a generator of RecordBatches, change error type to `TypeError` (10529)
- remove fixed\_seed and add pl.set\_random\_seed (10388)
- Make `arange` an alias for `int_range` (9983)
- `date_range`/`time_range` no longer return a `List` type (10526)
- Remove various functionalities deprecated before `0.18` (10527)
- Improve some error types and messages (10470)

โš ๏ธ Deprecations

- Rename `map` to `map_batches` (10801)
- Rename `GroupBy.apply` to `map_groups` (10799)
- Rename `DataFrame.apply` to `map_rows` (10797)
- Rename `Series/Expr.rolling_apply` to `rolling_map` (10750)
- Rename `Series/Expr.apply` to `map_elements` (10678)
- Rename `groupby` to `group_by` (10656)
- Deprecate some parameters of `cut`/`qcut` (10484)

๐Ÿš€ Performance improvements

- parse time zones outside of downcast\_iter() in replace\_time\_zone (10713)
- use binary abstraction for atan2 (10588)
- use binary abstraction in pow (10562)

โœจ Enhancements

- activate cse for group\_by (again) (10749)
- implementing sink\_csv for LazyFrame (10682)
- Supports series unique \& arg\_unique \& n\_unique for list (10743)
- repeat\_by should also support broadcasting of LHS (10735)
- deprecate 'use\_earliest' argument in favour of 'ambiguous', which can take expressions (10719)
- is\_first also supports numeric list type. (10727)
- improve slice pushdown in unions (10723)
- Explicitly implement `Protocol` for interchange classes (10688)
- Support min and max strategy for binary \& str columns fill null (10673)
- support broadcasting in list set operations (10668)
- csv: add schema argument (10665)
- Support `DataFrame` init from queries against users' existing database connections (10649)
- add `truncate_ragged_lines` (10660)
- supports cast to list (10623)
- Update a lot of error types (10637)
- preserve whitespace in notebook output (10644)
- Remove deprecated behavior from vertical aggregations (10602)
- support selector usage in `write_excel` arguments (10589)
- Add `LazyFrame.collect_async` and `pl.collect_all_async` (10616)
- Read/write support for IPC streams in DataFrames (10606)
- propagate null is in `is_in` and more generic array construction (10614)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- frame-level `cast` support (10504)
- Improve consistency of parsing expression input (9512)
- Add failed column to cast exception (10507)
- allow `from_arrow` to take a generator of RecordBatches, change error type to `TypeError` (10529)
- Remove deprecated `get_idx_type` - use `get_index_type` instead (10556)
- Make `arange` an alias for `int_range` (9983)
- `date_range`/`time_range` no longer return a `List` type (10526)
- Remove various functionalities deprecated before `0.18` (10527)
- Improve some error types and messages (10470)
- suggest str.to\_datetime instead of apply and stdlib strptime (10266)

๐Ÿž Bug fixes

- get\_single\_leaf can't handle Expr::Count (10790)
- support groupby literal in streaming (10771)
- `ORDER BY` on unselected columns (10752)
- Fix is\_in cannot cast list type for float (10769)
- whitespace CSS in Notebook HTML updated to use `pre-wrap` instead of `pre` (10739)
- only preserve sortedness flag in replace\_time\_zone when safe (10738)
- Error on `value_counts` on column named `"counts"` (10737)
- return `f64` for `rank` when `method="average"` (10734)
- Keep min/max and arg\_min/arg\_max consistent. (10716)
- use time zone from dtype to overwrite output time zone when initialising Series (10689)
- Cast small int type when scan csv in streaming mode. (10679)
- raise exception with invalid `on` arg type for join\_asof (10690)
- Reused input series in rolling\_apply should not be orderly (10694)
- re-sort buffer when update window swap the whole buffer (10696)
- Set the correct fast\_explode flag for ListUtf8ChunkedBuilder (10684)
- Sorted Utf8Chunked max\_str and min\_str should consider null value (10675)
- Correctly handle time zones in `write_delta` (10633)
- fix apply for empty series in threading mode (10651)
- respect 'ignore\_errors=False' in csv parser (10641)
- fix rename + projection pushdown (10624)
- fix int/float downcast in `is_in` (10620)
- Change behavior of `all` - fix Kleene logic implementation for `all`/`any` (10564)
- Fix serialization for categorical chunked. (10609)
- Take input\_schema to create physical expr for Selection (10571)
- Clear window cache after evaluate predication expr (10505)
- Parsing regex col in Expr::Columns (10551)
- sanitize column naming in boolean ops (10531)
- Fix `write_delta` with schema in `delta_write_options` (10541)
- remove fixed\_seed and add pl.set\_random\_seed (10388)
- respect `pl.Config` options relating to shape, column names, and types when rendering HTML (10449)

๐Ÿ› ๏ธ Other improvements

- update cargo.lock (10800)
- Create `.venv` in repo root (10789)
- refactored `write_database` unit tests to properly separate concerns (10773)
- Fix some broken links / formatting (10772)
- Document chained when-then behaviour more prominently (10759)
- Fix test failing due to new `adbc` release (10763)
- Unpin `connectorx` and bump other Python dependencies (10753)
- add note to `testing` docs about module import (10741)
- Clear GitHub Actions caches weekly (10715)
- Update for new pyarrow `13.0.0` behavior (10691)
- Fix minor issue with `sink_parquet` docs (10669)
- Remove `deprecate_renamed_methods` util (10537)
- add "see also" entries to ne/eq\_missing and update related examples (10667)
- fix potential memory leak from usage of `inspect.currentframe` (10630)
- give more relevant example for polars.apply (10631)
- Bump ruff and enable new setting (10626)
- Add docstrings for `Expr.meta` namespace (10617)
- Enforce up-to-date `Cargo.lock` (10555)
- deprecate DataFrame.replace (10600)
- ensure that `make requirements` fully refreshes unpinned packages/deps (10591)
- fix out-of-date explain default parameter (10566)
- Fix `expr_dispatch` decorator to work on methods with decorators (10549)
- Fix link to source code (10542)
- Add title to index page (10539)
- Disable SIM108 lint (10519)
- Keep versioned docs (10500)
- switch to `pyo3/maturin-action` (10503)
- Update URLs for dev documentation (10495)
- Skip failing test (10496)
- Add version switcher to API reference (10488)

Thank you to all our contributors for making this release possible!
JulianCologne, MarcoGorelli, Object905, OndrejSlamecka, SeanTroyUWO, VasanthakumarV, alexander-beedie, aminalaee, braaannigan, c-peters, ion-elgreco, lorepozo, marki259, mcrumiller, messense, orlp, owrior, rben01, reswqa, ritchie46, sdamashek, stinodego, svaningelgem, titoeb, trueb2, washcycle and zundertj


py-0.18.15
๐Ÿž Bug fixes

- rollback cse in groupby: python 0.18.15 (10491)

๐Ÿ› ๏ธ Other improvements

- Mark import timing check as slow (10487)
- Gather all streaming tests (10485)
- Bump `maturin` to version 1.2.1 (10479)

Thank you to all our contributors for making this release possible!
ritchie46 and stinodego


rs-0.32.0
๐Ÿ† Highlights

- common subexpression elemination (9632)

๐Ÿ’ฅ Breaking changes

- remove deprecate tz\_localize, name CastTimezone to ReplaceTimeZone (10070)

โš ๏ธ Deprecations

- renaming `approx_unique` as `approx_n_unique` (10290)
- remove/deprecate cache and its logic (10066)
- Add `date_ranges`/`time_ranges` expression functions (10005)

๐Ÿš€ Performance improvements

- pre-alloc int\_ranges (10399)
- use hash as CSE Identifier (10385)
- re-use regex capture allocation (10302) (10335)
- don't parallelize literal expressions (10321)
- fix O(n^2) in sorted check during append (10241)
- speedup mode on sorted data (10084)
- speedup boolean apply (10073)
- shrink alp/lp `~2.5x` (10039)
- Remove fused arithmetic from expressions with literals (10011)

โœจ Enhancements

- quote style option for csv writer (10422)
- add "raise\_if\_empty" flag to `read_excel`, `read_csv`, `scan_csv`, and `read_csv_batched` (10409)
- be more permissive on predicate pushdown to left side of left join (10442)
- add `use_earliest` to `to_datetime` / `strptime` (10426)
- {any/all}\_horizontal to expression architecture (10412)
- serialize flags (10140)
- allow unaligned pointers in arrow FFI (10403)
- add line\_terminator option to write\_csv (10373)
- Add `is_local` and `to_local` to categorical namespace (10372)
- cse for groupby.agg and reduced cse collisions (10381)
- re-use regex capture allocation (10302) (10335)
- Add `Series.cat.uses_lexical_ordering` (10325)
- improve datetime parsing error message (10332)
- allow sequential runners in select/with\_columns (10322)
- improve err msg parsing `time`, `date`, `datetime` (10298)
- Add `str.extract_groups` (10179)
- add extra build profiles (10268)
- Extend `datetime` expression function with time zone/time unit parameters (10235)
- added gcs to gcp cloud schema in polars-core::cloud 10206. (10207)
- support writing duration type in json (10112)
- inline `lit(Series).cast(..)` to -> `lit(Series.cast(..))` (10092)
- Move transpose naming to Rust (10009)
- cse in groupby's (10062)
- Adds sql `CASE` statement expressions (10065)
- Add `date_ranges`/`time_ranges` expression functions (10005)
- comm\_subexpr\_elim in streaming 'select/with\_columns' (10050)
- common subexpression elemination (9632)
- Let qcut create evenly spaced probabilities (9960)
- sorted flag on singletons (9933)
- maintain sorted flag after partition\_by (9944)
- keep sorted flag in streaming left join (9932)
- Add cloudpickle for serializing python UDFs (9921)

๐Ÿž Bug fixes

- Fix incorrect handling of VisitRecursion::Skip. (10452)
- fix negative decimal parsing (10444)
- ensure sorted\_sink hash equals the default path (10464)
- fix sum agg (10459)
- ensure last aggregation deals with default chunk (10453)
- fix cse input schema (10450)
- fix list groupby of array dtype (10408)
- correct AnyValue::hash (10391)
- finalize cast in partitioned groupby (10359)
- fix oob in 'last' (10329)
- fix categorical lexical sort (10318)
- Fix join validation (10257)
- Set correct dtype for `.extract_groups()` (10306)
- clear window cache and run windows on proper runners (10303)
- fix sorted fast path in streaming groupby wrt nulls (10289)
- fix nan aggregation in groupby (10287)
- check dtypes of single-column 'by' parameter in asof-join (10284)
- fix pyo3 link errors on macos (10256)
- fix empty streaming parquet file (10252)
- fix logical columns of streaming multi-column sort (10250)
- fix date/datetime parsing for short inputs with exact=False (10231)
- correct agg\_sum for ChunkedArray. (10243)
- don't panic in wildcard apply (10240)
- fix cse profile (10239)
- correct struct null counts (10142)
- no cse in groupby until fixed (10216)
- fix `is_in` on empty series (10195)
- fix cse windows (10197)
- block predicate pushdown is\_in and null producing โ€ฆ (10194)
- prevent re-ordering of dict keys inside `.apply` (10172)
- initialize fixed null values (10192)
- ensure window function run partitioned when cse is hit (10170)
- adjust for null values in str.replace fast path (10132)
- clear bit settings in list iteration (10131)
- use row-encoded for struct::is\_sorted (10129)
- fix(rust, python): don't run file-caching in streaming mode (10117)
- Allow initialize of pl.Array in Dataframe using schema alone (10100)
- don't panic if masked out values are invalid in temporal kernels (10114)
- Fix struct get field by index out of bounds error. (10097)
- fix ub in simd-json (10093)
- fix invalid access when groupby rolling produces empty sets (10109)
- respect `null_on_oob=False` in `list.take` when paโ€ฆ (10105)
- fix is\_sorted for structs (10099)
- add file path to io error in scan\_csv (10076)
- fix false positive in parquet stats evaluation (10087)
- fix error message from cast-timezone to replace-time-zone (10089)
- Address `.col(regex).exclude()` operations not executing. (10025)
- fix Boolean::isin(null values) (10074)
- predicate pushdown 10058 (10071)
- Fix weighted quantile for 0 weights (10051)
- fix incorrect state in projection pushdown with joins (9987)
- don't pass predicates referring to renamed literalโ€ฆ (9965)
- fix regression in regex expansion (9952)
- potential SO in csv infer schema (9950)
- raise on unsupported transpose and object types (9946)
- Fix as-of join when `by` groups are interleaved (9938)

๐Ÿ› ๏ธ Other improvements

- fix and run polars-plan tests (10465)
- Simplify flag methods (10429)
- match\_block\_trailing\_comma (10414)
- implement ChunkArray::(try\_)from\_chunk\_iter (10395)
- add test for 10401 (10405)
- Bump some dependencies (10396)
- Move dependency version info to workspace level (10295)
- patch reedline until fix released (10382)
- remove wasm-timer dependency (10347)
- write down invariants of ChunkedArray (10334)
- fix typo in lib.rs (10313)
- Exclude examples from workspace default (10309)
- Update CODEOWNERS (10261)
- avoid outputting docs of dependencies (10292)
- Do not keep history in `gh-pages` branch (10282)
- Use workspace package info / organize dependencies section (10279)
- fix dead links in Rust documentation (10251)
- Fix `make pre-commit` command (10205)
- Fix `make integration-tests` command (10202)
- Replace "question" issues with link to Stack Overflow (10230)
- Update dependabot config (10222)
- Fix LICENSE symlink for moved crates (10150)
- Re-organize folder structure for Rust crates (10141)
- update to rustc nightly-2023-07-27 (10139)
- temporarily turn off fail-fast so that ubuntu tests run (10133)
- Refactor `when`/`then`/`otherwise` internals (9922)
- move replace\_time\_zone to polars-ops (10078)
- remove unneeded branch (10082)
- remove deprecate tz\_localize, name CastTimezone to ReplaceTimeZone (10070)
- fix typo in contribution example (10038)
- correct example in API reference (10032)
- add developer contribution examples (10013)
- Update autolabeler again (9984)
- fix docs build and add to CI (9904)
- Minor makeover for Rust Makefile (9874)

Thank you to all our contributors for making this release possible!
0xbe7a, CanglongCl, JulianCologne, MarcoGorelli, OndrejSlamecka, OneRaynyDay, SeanTroyUWO, StefanBRas, TLouf, alexander-beedie, c-peters, cjackal, cmdlineluser, dependabot, dependabot[bot], drgif, duvenagep, eltociear, fsimkovic, ion-elgreco, jonashaag, lfn3, magarick, mcrumiller, orlp, potzenhotz, rea1bacon, reswqa, rikkaka, ritchie46, stinodego, thomasaarholt, varunmittal91 and zundertj


py-0.18.14
๐Ÿ† Highlights

- Native implementation of dataframe interchange protocol (10267)

โš ๏ธ Deprecations

- Deprecate behavior of list/tuple inputs for `lit` (10461)

๐Ÿš€ Performance improvements

- optimise retrieval of values from `df.item` (~4-5x speedup) (10411)
- pre-alloc int\_ranges (10399)
- use hash as CSE Identifier (10385)

โœจ Enhancements

- quote style option for csv writer (10422)
- add "raise\_if\_empty" flag to `read_excel`, `read_csv`, `scan_csv`, and `read_csv_batched` (10409)
- add `use_earliest` to `to_datetime` / `strptime` (10426)
- add new "header\_format" option for `write_excel` (10392)
- {any/all}\_horizontal to expression architecture (10412)
- Native implementation of dataframe interchange protocol (10267)
- allow unaligned pointers in arrow FFI (10403)
- add line\_terminator option to write\_csv (10373)
- add explicit `selector` variants for signed/unsigned integers (10384)
- Add `is_local` and `to_local` to categorical namespace (10372)
- enhance `selectors` expansion function, so it can operate on a schema as well as a frame (10341)
- Order percentiles in `describe` (10378)
- cse for groupby.agg and reduced cse collisions (10381)
- improve take\_every(0) exception (10352)
- add offset and length to get\_ptr (10361)

๐Ÿž Bug fixes

- fix pyarrow write\_to\_dataset wrt check\_not\_directory parameter (10471)
- fix negative decimal parsing (10444)
- ensure sorted\_sink hash equals the default path (10464)
- address inconsistency in init from square numpy arrays with/without an explicit schema (10445)
- ensure last aggregation deals with default chunk (10453)
- fix cse input schema (10450)
- Fix by argument handling in join\_asof (10447)
- fix potential `OverflowError` in testing asserts with huge `UInt64` diffs (10437)
- Create delta compatible schema during writing (10165)
- fix list groupby of array dtype (10408)
- correct AnyValue::hash (10391)
- finalize cast in partitioned groupby (10359)

๐Ÿ› ๏ธ Other improvements

- add `vertical_relaxed` example for `pl.concat` (10472)
- Run all streaming tests on the same test runner (10469)
- Organize OOC tests (10463)
- add test for 10417 (10420)
- Clean up some `Sphinx` settings (10400)
- add test for 10401 (10405)
- Address Ruff per file ignores (10258)
- Small improvement for PySeries.get\_buffer (10363)

Thank you to all our contributors for making this release possible!
MarcoGorelli, OndrejSlamecka, alexander-beedie, c-peters, cmdlineluser, drgif, ion-elgreco, lfn3, orlp, potzenhotz, rea1bacon, reswqa, ritchie46, stinodego and zundertj


py-0.18.13
โš ๏ธ Deprecations

- Rename `LazyFrame.read/write_json` to `de/serialize` (10238)
- Add `categorical_as_str` parameter to testing utils (10350)

๐Ÿš€ Performance improvements

- don't parallelize literal expressions (10321)

โœจ Enhancements

- support `selectors` in additional frame methods (10255)
- Add `Series.cat.uses_lexical_ordering` (10325)
- utility to get buffers and pointers (10331)
- improve datetime parsing error message (10332)
- add ptr for small integer types (10330)
- add offsets utility (10328)
- allow sequential runners in select/with\_columns (10322)
- warn about inefficient apply json.loads if json is local import (10310)
- improve err msg parsing `time`, `date`, `datetime` (10298)
- Add `categorical_as_str` parameter to testing utils

๐Ÿž Bug fixes

- fix oob in 'last' (10329)
- show inefficient apply warning in ipython (10312)
- add cse to no\_optimization in profile (10317)
- fix categorical lexical sort (10318)
- Fix join validation (10257)
- Set correct dtype for `.extract_groups()` (10306)

Thank you to all our contributors for making this release possible!
CanglongCl, JulianCologne, MarcoGorelli, alexander-beedie, cmdlineluser, eltociear, orlp, ritchie46 and stinodego


py-0.18.12
โš ๏ธ Deprecations

- renaming `approx_unique` as `approx_n_unique` (10290)
- Rename first `qcut` parameter to `quantiles` (10253)
- Deprecate `avg` alias for `mean` (10236)

๐Ÿš€ Performance improvements

- fix O(n^2) in sorted check during append (10241)

โœจ Enhancements

- Add `str.extract_groups` (10179)
- raise `TypeError` for all LazyFrame comparison operators (10275)
- support bytecode translation to `map_dict` where the lookup key is an expression (10265)
- add entry point to the Consortium DataFrame API (10244)
- Extend `datetime` expression function with time zone/time unit parameters (10235)
- add "batch\_size" to `scan_pyarrow_dataset` parameters (10249)

๐Ÿž Bug fixes

- clear window cache and run windows on proper runners (10303)
- fix sorted fast path in streaming groupby wrt nulls (10289)
- Fix interchange protocol allowing copy even when `allow_copy` was set to False (10262)
- fix nan aggregation in groupby (10287)
- don't panic on cse if function hasn't implemented \_\_eq\_\_ (10286)
- fix empty streaming parquet file (10252)
- fix logical columns of streaming multi-column sort (10250)
- fix date/datetime parsing for short inputs with exact=False (10231)
- don't panic in wildcard apply (10240)
- fix cse profile (10239)

๐Ÿ› ๏ธ Other improvements

- Update CODEOWNERS (10261)
- add note about pyarrow partitioning (10297)
- Do not keep history in `gh-pages` branch (10282)
- make an explicit note in `read_parquet` and `scan_parquet` about hive-style partitioning (point to `scan_pyarrow_dataset` instead) (10277)
- Fix typo in error message (10281)
- Replace "question" issues with link to Stack Overflow (10230)
- Use sphinx' `maximum_signature_line_length` (10228)
- add warning about parallel eval of `.then(..)` branches (10229)
- Update Sphinx to 7.1.1 and bump related dependencies (10221)
- Update dependabot config (10222)

Thank you to all our contributors for making this release possible!
0xbe7a, MarcoGorelli, TLouf, alexander-beedie, cmdlineluser, dependabot, dependabot[bot], duvenagep, mcrumiller, orlp, reswqa, ritchie46 and stinodego


py-0.18.11
๐Ÿž Bug fixes

- correct struct null counts (10142)
- no cse in groupby until fixed (10216)
- avoid false positives from multiple `RETURN_VALUE` ops when checking `apply` lambdas/functions (10211)

๐Ÿ› ๏ธ Other improvements

- Improve deprecation utils (10167)

Thank you to all our contributors for making this release possible!
alexander-beedie, magarick, ritchie46, stinodego and varunmittal91


py-0.18.10
โœจ Enhancements

- raise a better error message from `read_database` if not passed a string URI (10191)
- Add pyarrow write\_to\_dataset to write\_parquet function (9835)

๐Ÿž Bug fixes

- fix `is_in` on empty series (10195)
- fix cse windows (10197)
- block predicate pushdown is\_in and null producing โ€ฆ (10194)
- prevent re-ordering of dict keys inside `.apply` (10172)
- initialize fixed null values (10192)
- Don't pickle `_scan_impl` (10175)
- ensure window function run partitioned when cse is hit (10170)

๐Ÿ› ๏ธ Other improvements

- prepend set\_ to set operations on lists (10182)
- Track version in deprecation utils (10147)
- Add a simple util `issue_deprecation_warning` (10146)
- more precise checks for inefficient apply warnings (10135)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, cjackal, cmdlineluser, potzenhotz, ritchie46 and stinodego


py-0.18.9
๐Ÿ† Highlights

- common subexpression elemination (9632)

โš ๏ธ Deprecations

- Deprecate parsing string inputs as literals for `when-then-otherwise` (10122)
- deprecate "connection\_uri" โ†’ "connection" param in read/write database methods (10134)
- remove/deprecate cache and its logic (10066)
- Add `date_ranges`/`time_ranges` expression functions (10005)

๐Ÿš€ Performance improvements

- speedup mode on sorted data (10084)
- speedup boolean apply (10073)
- shrink alp/lp `~2.5x` (10039)

โœจ Enhancements

- suggest map\_dict instead of lambda x: DICT[x] (10123)
- enable "inefficient apply" warnings from `Series` (10104)
- support writing duration type in json (10112)
- BytecodeParser can now handle mixed/nested `and/or` control flow (10085)
- inline `lit(Series).cast(..)` to -> `lit(Series.cast(..))` (10092)
- Add ArcTan2 to `SQLContext` (9571)
- cse in groupby's (10062)
- Adds sql `CASE` statement expressions (10065)
- Add `date_ranges`/`time_ranges` expression functions (10005)
- comm\_subexpr\_elim in streaming 'select/with\_columns' (10050)
- add dataframe.flags property (10037)
- common subexpression elemination (9632)
- detect and warn about usage of str/int/float python-based casts with `apply` (10026)
- detect and warn about usage of `json.loads` in conjunction with `apply` (10023)
- detect and warn about bare `numpy` functions passed to `apply` (10021)
- support bytecode identification/mapping of python string-case functions in UDFs (10007)
- support bytecode identification of `numpy` functions in UDFs that we can map to native expressions (10003)

๐Ÿž Bug fixes

- adjust for null values in str.replace fast path (10132)
- clear bit settings in list iteration (10131)
- use row-encoded for struct::is\_sorted (10129)
- fix(rust, python): don't run file-caching in streaming mode (10117)
- Allow initialize of pl.Array in Dataframe using schema alone (10100)
- silence Series.apply inefficient apply warning when calling Expr.apply (10116)
- don't panic if masked out values are invalid in temporal kernels (10114)
- Fix struct get field by index out of bounds error. (10097)
- fix ub in simd-json (10093)
- fix invalid access when groupby rolling produces empty sets (10109)
- respect `null_on_oob=False` in `list.take` when paโ€ฆ (10105)
- undo regression in scan\_parquet from s3 (10098)
- fix is\_sorted for structs (10099)
- add file path to io error in scan\_csv (10076)
- fix false positive in parquet stats evaluation (10087)
- Address `.col(regex).exclude()` operations not executing. (10025)
- address an inadvertently shallow-copy issue on underlying PySeries (10086)
- fix Boolean::isin(null values) (10074)
- predicate pushdown 10058 (10071)
- map 'postgres' URI prefix to ADBC 'postgresql' module (10018)
- Fix weighted quantile for 0 weights (10051)
- eager `time_range`/`date_range` dimensions fix (9996)

๐Ÿ› ๏ธ Other improvements

- get test\_udfs running on all python versions again (10136)
- temporarily turn off fail-fast so that ubuntu tests run (10133)
- clarify "clones data" in to\_numpy (10095)
- Refactor `when`/`then`/`otherwise` internals (9922)
- Properly format `Returns` sections of docstrings (10064)
- much-improved `Instruction` matching for `BytecodeParser` (10040)
- add pure-python tests and CI for bytecodeparser (10027)
- split-out expression translation and instruction-rewrite logic from `BytecodeParser` (10012)
- cleans api sections in docs (10004)
- Bump some dependencies (9997)
- Add patchelf extra to maturin (9995)
- restructure all UDF parsing/translation methods into a new `BytecodeParser` class (9993)
- Clean up `date_range`/`time_range` (9985)

Thank you to all our contributors for making this release possible!
MarcoGorelli, SeanTroyUWO, alexander-beedie, c-peters, cmdlineluser, jonashaag, magarick, mcrumiller, rikkaka, ritchie46 and stinodego


py-0.18.8
โš ๏ธ Deprecations

- Add `Series.extend` (9901)
- Deprecate functions series input (9878)

๐Ÿš€ Performance improvements

- Rolling min/max for partially sorted data (9819)
- Use `pyo3::intern` to avoid needlessly recreating PyString (9853)

โœจ Enhancements

- Name transpose from column (9846)
- adds `SQRT`, `CBRT`, `PI` functions to `SQLContext` (9936)
- Let qcut create evenly spaced probabilities (9960)
- add freeze\_panes option to write\_excel (9974)
- initial support for parsing the set of `jump` bytecode instructions required to reconstruct `and/or` logic (9972)
- suggest more efficient expression if user passes simple lambda to Expr.apply or DataFrame.apply (9918)
- sorted flag on singletons (9933)
- maintain sorted flag after partition\_by (9944)
- keep sorted flag in streaming left join (9932)
- Add cloudpickle for serializing python UDFs (9921)
- Optional three-valued logic for any/all (9848)
- Add `Series.extend` (9901)
- pass through unknown schema in unnest (9896)
- convenience support for parsing a list of SQL strings with `sql_expr` (9881)
- respect and allow more options in eager json parsing (9882)
- allow set\_sorted in streaming (9876)
- Expr.cat.get\_categories expression (9869)
- add `LENGTH` and `OCTET_LENGTH` string functions for SQL (9860)
- `polars_warn!` macro (9868)

๐Ÿž Bug fixes

- fix incorrect state in projection pushdown with joins (9987)
- don't pass predicates referring to renamed literalโ€ฆ (9965)
- fix regression in regex expansion (9952)
- potential SO in csv infer schema (9950)
- raise on unsupported transpose and object types (9946)
- Fix as-of join when `by` groups are interleaved (9938)
- Handle `DataFrame.extend` extending by itself (9897)
- don't SO on align\_frames (9911)
- respect original series dtype when constructing `LitIter` (9886)
- Handle `DataFrame.vstack` stacking itself (9895)
- sum aggregation empty set is 0, not null (9894)
- preserve expression aliases when parsing SQL with `pl.sql_expr` (9875)
- fmt unknown dtype (9872)

๐Ÿ› ๏ธ Other improvements

- Update autolabeler again (9984)
- use param\_name more in udfs for greater defensiveness (9969)
- fix or/and docstrings to say bitwise, not logical (9964)
- minor fix for `apply` docstring example text (9953)
- add note that `collect_all` returns result frames in the same order as input (9951)
- Improve docstrings for renaming operations (9942)
- Move `sink_*` methods to IO chapter (9939)
- Add 'nearest' in Expr.interpolation docstring with an example (9935)
- fix hyperlinks to pandas (9937)
- Address ignored Ruff doc rules (9919)
- improve `weekday`, `day`, `ordinal_day` examples (9926)
- deprecate `bins` argument and rename to `breaks` in `Series.cut` (9913)
- Use Pathlib everywhere (9914)
- Add various unit tests (9903)
- add big warnings about using apply (9906)
- Update autolabeler (9885)
- Workaround for PyCharm deprecation warning (9907)
- Mention func\_horizontal on deprecated func docstrings (9863)
- note ordering guarantee for groupby (9879)
- add logo `link` entry to sphinx conf and factor-out website root paths (9864)

Thank you to all our contributors for making this release possible!
0xbe7a, JulianCologne, MarcoGorelli, OneRaynyDay, SeanTroyUWO, StefanBRas, alexander-beedie, c-peters, fsimkovic, ion-elgreco, magarick, mcrumiller, messense, ritchie46, sorhawell, stinodego, thomasaarholt and zundertj


rs-0.31.1
๐Ÿš€ Performance improvements

- Rolling min/max for partially sorted data (9819)
- use hash set in drop\_many (9807)
- Faster is\_sorted when no flag set (9777)
- optimize n\_unique for integers (9568)
- remove sort columns on multiple-key OOC sort (9545)
- don't needlessly trigger bitcount (9561)
- don't initialize memory before row-encoding (9435)
- reduce page faults in q1 `~-30%` (9423)
- reduce rayon/idle time in streaming (9416)
- use row format in streaming join `~15%` (9379)
- row encode buffer reuse (9371)
- bytes row format for streaming groupby/unique keys `>3.5x` (9346)
- push slices down map functions (9350)
- increase streaming groupby spill size from 256 to 10\_000 (9312)
- perf(rust, python) Improve rolling min and max for nonulls (9277)
- slightly improve n\_unique performance (9286)
- speed up write\_csv for time-zone-aware columns (9093)
- parallelize rolling\_window group materialization (9095)

โœจ Enhancements

- pass through unknown schema in unnest (9896)
- access `OptState` in `LazyFrame` to unit-test optimization toggle methods. (9883)
- respect and allow more options in eager json parsing (9882)
- allow set\_sorted in streaming (9876)
- Expr.cat.get\_categories expression (9869)
- add `LENGTH` and `OCTET_LENGTH` string functions for SQL (9860)
- `polars_warn!` macro (9868)
- Add Run-length Encoding functions (9826)
- add `include_key` parameter to `partition_by` (9750)
- add `LEFT` string function for SQL (9836)
- add `REGEXP_LIKE` function for SQL (both two and three parameter version) (9838)
- add `maintain_order` argument to `sort`/`top_k`/`bottom_k` (9672)
- add drop\_many\_amortized (9814)
- Dedicated horizontal aggregation functions (9752)
- implement with\_row\_count as private function (9810)
- add support for SQL `SUBSTR` function (9803)
- add SQL support for binary data and expand recognised SQL dtype strings (9802)
- reworked comfy-table layout constraints, improving table wrapping/repr (9744)
- allow qcut in window expressions (9745)
- Improve cut and allow use in expressions (9580)
- clearer message when stringcache-related errors occur (9715)
- improve expression formatting (9704)
- set string cache in window functions (9705)
- raise on both sides of datetime/str comparison (9692)
- support deserializing struct json into df (9688)
- add tree formatter for expressions (9684)
- add `.list.any()` and `.list.all()` (9573)
- extend dtype/selector matching for `Datetime` with a "\*" wildcard for timezones (9641)
- add polars::VERSION (9660)
- add symmetric difference to list set operations (9655)
- add dt.base\_utc\_offset (9636)
- add dt.dst\_offset feature (9629)
- allow to specify index order in `to_numpy` (9592)
- accept expressions in `repeat` (9614)
- set operations for list (9599)
- add drop\_first parameter for to\_dummies (issue 8246) (9143)
- raise if window size in rolling functions isn't strictly positive (9465)
- add infer schema len to json\_extract (9478)
- Adds (Most) Remaining Trig Functions to `SQLContext` (9453)
- update error handling msg for sql functions (9474)
- add str.titlecase (9457)
- raise if period is negative in groupby\_rolling (9445)
- add SQL `round` support (9330)
- dont error for time-zone-aware parsing if time zone is UTC (9414)
- support all numeric dtypes in serde (9393)
- ensure part of the plan is streaming if aggregatiโ€ฆ (9387)
- add relaxed concatenation (9382)
- add sql DROP TABLE (9355)
- support ternary expressions in streaming (9343)
- add decoding support for row format (9339)
- add SQL support for null-aware equality checks (9332)
- add SQL support for regular expression operators (`~`, `!~`, `~*`, and `!~*`) (9327)
- support `//` integer floordiv operator in the SQL engine (9324)
- serde for 'to\_physical' expr (9294)
- add join cardinality validation (9278)
- keep sorted flag after Expr::truncate (9275)
- add "sql\_expr" function (9248)
- rewrite correlation functions to expression architecture (9258)
- keep sorted flag on `offset_by` (9253)
- add intersection primitive for selector API (9240)
- building blocks for expression expansion sets (9231)
- Add ddof option to rolling\_var and rolling\_std (8957)
- immediately flatten nested unions (9220)
- support float expression on integers (9210)
- add binary to list\<u8> cast (9161)
- add arr.unique expression (9159)
- implement explode for DataType::Array (9157)
- `Decimal` type: `sum`, `min`, `max` aggregations in `select` and `agg` context. (9135)
- Decimal arithmetic (9123)
- support decimals as cast types in csv parser (9121)
- Improve error handling for `repeat` (9117)
- conversion from `Utf8` to `Decimal`. (9090)

๐Ÿž Bug fixes

- fix(rust,python) respect original series dtype when constructing `LitIter` (9886)
- sum aggregation empty set is 0, not null (9894)
- Allow None as exponent (9880)
- preserve expression aliases when parsing SQL with `pl.sql_expr` (9875)
- fmt unknown dtype (9872)
- fix row-encode of 32 byte payloads (9843)
- shrink\_type on all-null columns (9811)
- don't go into streaming engine when groupby by list (9834)
- fix regex + exclude (9827)
- potential integer overflow in drop\_many\_amortized (9829)
- add `maintain_order` argument to `sort`/`top_k`/`bottom_k` (9672)
- fix array concat and Series::fill\_null (9825)
- dont preserve sortedness in offset\_by for tz-aware non-constant durations (9818)
- Remove stray `arr.eval` references (9821)
- fix row-encode of null data (9813)
- allow +00:00 when loading from arrow (9747)
- fix row-count schema (9797)
- fix supertype detection (9787)
- merge rev-maps when building list arrays of categoricals. (9742)
- Loosen restrictions on cut expressions and add docs (9730)
- Fix list symmetric difference (9732)
- Fix list intersection (9735)
- don't clear rev\_map when categorical series is cleโ€ฆ (9720)
- fix(rust, python) improve glob pattern testing (9721)
- don't run hstack checks when using cached names (9709)
- fix result dtype in date\_range(..., eager=True) if duration contains "1s1d" (9670)
- increment seed between samples (9694)
- fix cse\_plan invalid projection removal (9700)
- fix ne\_missing for booleans vs lit (9693)
- raise if to\_datetime would have parsed input incorrectly (9675)
- respect time\_zone in lazy date\_range (8591)
- redo weighted rolling var (9609)
- Correct weighted rolling quantile definition (9608)
- clear hashes buffer in generic streaming joins (9612)
- stable list namespace ouput when all elements are โ€ฆ (9610)
- validate time zone in cast and from\_arrow operations (9598)
- make json feature depend on "dtype-struct" feature (9589)
- fix join suffix collision (9579)
- fix sum consistency (9576)
- fix take of array dtype (9575)
- fix predicate pushdown case before sort (9574)
- fix lazy schema of temporal\_range functions when no alias is provided (9543)
- change the path parameter from to (9531)
- fix join validation when swapped (9534)
- fix race condition in out-of-core sort (9521)
- unset sortedness for local date and local datetime (9515)
- maintain sortedness flags on append/extend (9496)
- fix serde for small integer dtypes (9495)
- raise if window size in rolling functions isn't strictly positive (9465)
- groupby rolling with negative offset (9428)
- date\_range with unit microseconds was producing incorrect results (9413)
- read\_csv was parsing dates incorrectly when the dtype was overridden (9420)
- Compute Spearman rank correlations using average raโ€ฆ (9415)
- Fix rolling min/max when window is empty (9406)
- fix compilation of other rustc versions (9392)
- list zip with (9367)
- parquet + categorical (9363)
- respect startby in groupby\_dynamic when every is greater than 1d (9362)
- raise groupby apply on empty frame (9360)
- raise more informative error on string arguments (9352)
- correct assertion (9320)
- fix rolling weighted mean (9292)
- raise on invalid sort\_by (9262)
- correct ne/e\_missing schema (9257)
- fix cached reproject offsets (9254)
- delay opening files in streaming engine (9251)
- ensure agg(F(lit)) == lit (9222)
- don't SO on concat(expressions) (9214)
- clip window\_size to length in rolling\_apply (9209)
- rolling\_apply window\_size == len (9181)
- respect time zone in strptime/to\_datetime when exact=False (9171)
- make null chunking behavior equal to other dtypes (9176)
- return single numpy array in Array dtype -> numpy (9164)
- fix regression in boolean nulls comparison (9142)
- fix struct null\_count if fields are null arrays (9151)
- categorical construction from null values (9145)
- let `apply` caller determine if length needs to be checked. (9140)
- struct `is_in` should upcast numeric types (9110)
- json\_extract on empty series (9126)
- bubble up dtype when converting from arrow (9120)
- rolling\_groupy was returning incorrect results when offset was positive (9082)

๐Ÿ› ๏ธ Other improvements

- Rolling quantile and median use DynArgs (9867)
- Clean up workspace definition (9861)
- Fix all clippy warnings in the test suite (9839)
- Refactor failing test (9823)
- Remove stray `arr.eval` references (9821)
- fix cut features (9808)
- cluster file scans in one node (9799)
- Remove old cut/qcut (9763)
- Small updates to issue templates (9789)
- unswap from\_tz and to\_tz in replace\_timezone (9768)
- More cleanup around `arange` (9769)
- More cleanup for `arange` (9681)
- Fix small typo (9714)
- refactor `arange` and add `int_range`/`int_ranges` (9666)
- clean up inconsistencies in duration string language (9551)
- ensure date-range integration test runs in CI (9554)
- remove some redundancies in sort (9541)
- Fix some doc examples (9405)
- Remove outdated badges from README (9532)
- don't pickle pyarrow dataset (9523)
- Remove StdWindow in rolling (9486)
- remove unreachable code (9463)
- note that weekday is actually ISO weekday (9440)
- Add some documentation on the CI workflows (9404)
- fix typo in polars-lazy docs (9354)
- Utilize caching in test job (9301)
- Caching for benchmark workflow (9267)
- Further CI cleanup for Rust lints (9260)
- Separate workflow for Rust lints (9245)
- Fix itoap dependency specification (9239)
- Fix more broken links (9230)
- Fix some doc links (9227)
- Fix unused import warning in release build (9224)
- split up dsl::functions module (9213)
- update object\_store requirement from 0.5.3 to 0.6.0 (9154)
- simplify slow datetime parser (9183)
- remove outdated struct, improve naming (9172)
- change decimal inference and argument order (9133)
- Include license file in polars-json crate (9113)
- Remove dbg statement from CoreJsonReader (9114)
- use concrete type for time zones (9076)

Thank you to all our contributors for making this release possible!
0xbe7a, AnatolyBuga, CloseChoice, DeflateAwning, EdmundsEcho, MarcoGorelli, SeanTroyUWO, alexander-beedie, ankane, avimallu, baggiponte, bfeif, borchero, braaannigan, c-peters, datapythonista, dependabot, dependabot[bot], dkrako, durandtibo, eitsupi, guanqun, jeroenjanssens, jonashaag, jorisSchaller, josh, kljensen, lorentzenchr, magarick, mcrumiller, messense, mgperry, mishpat, moritzwilksch, ritchie46, sorhawell, stinodego, tarrafil, thomascamminady, ttencate, universalmind303 and zundertj


py-0.18.7
๐Ÿš€ Performance improvements

- speed up python object to AnyValue construction (9840)
- use hash set in drop\_many (9807)
- speed up `in series` 10x (9794)
- Faster is\_sorted when no flag set (9777)

โœจ Enhancements

- Add Run-length Encoding functions (9826)
- add `include_key` parameter to `partition_by` (9750)
- add `LEFT` string function for SQL (9836)
- add `REGEXP_LIKE` function for SQL (both two and three parameter version) (9838)
- add `maintain_order` argument to `sort`/`top_k`/`bottom_k` (9672)
- Dedicated horizontal aggregation functions (9752)
- support numpy datetime64 units (from 'ns' to 'D') in polars.from\_numpy (9783)
- implement with\_row\_count as private function (9810)
- add support for SQL `SUBSTR` function (9803)
- add SQL support for binary data and expand recognised SQL dtype strings (9802)
- add new `duration` selector and improve selector typing (9772)
- reworked comfy-table layout constraints, improving table wrapping/repr (9744)

๐Ÿž Bug fixes

- fix row-encode of 32 byte payloads (9843)
- shrink\_type on all-null columns (9811)
- don't go into streaming engine when groupby by list (9834)
- fix regex + exclude (9827)
- add `maintain_order` argument to `sort`/`top_k`/`bottom_k` (9672)
- fix array concat and Series::fill\_null (9825)
- dont preserve sortedness in offset\_by for tz-aware non-constant durations (9818)
- Remove stray `arr.eval` references (9821)
- fix row-encode of null data (9813)
- allow +00:00 when loading from arrow (9747)
- improve/fix `write_database` handling of db schema and quoted table names (9788)
- fix row-count schema (9797)
- fix supertype detection (9787)
- fix import error when writing parquet with pyarrow (9760)

๐Ÿ› ๏ธ Other improvements

- Refactor failing test (9823)
- Remove stray `arr.eval` references (9821)
- Remove old cut/qcut (9763)
- improve note about the behaviour when converting from ns-precision temporal values to python-native types (9798)
- Small updates to issue templates (9789)
- More cleanup around `arange` (9769)
- add missing `last` entry (9782)
- Add `rows_by_key` docs (9766)

Thank you to all our contributors for making this release possible!
CloseChoice, MarcoGorelli, alexander-beedie, avimallu, jonashaag, magarick, mcrumiller, ritchie46 and stinodego


py-0.18.6
โœจ Enhancements

- allow qcut in window expressions (9745)

๐Ÿž Bug fixes

- merge rev-maps when building list arrays of categoricals. (9742)
- Loosen restrictions on cut expressions and add docs (9730)
- Fix list symmetric difference (9732)
- Fix list intersection (9735)

Thank you to all our contributors for making this release possible!
magarick and ritchie46


py-0.18.5
๐Ÿ† Highlights

- drop Python 3.7 support (9679)

๐Ÿš€ Performance improvements

- optimize n\_unique for integers (9568)
- remove sort columns on multiple-key OOC sort (9545)
- don't needlessly trigger bitcount (9561)
- optimize `_datetime_to_pl_timestamp` (9533)

โœจ Enhancements

- Improve cut and allow use in expressions (9580)
- clearer message when stringcache-related errors occur (9715)
- improve expression formatting (9704)
- set string cache in window functions (9705)
- raise on both sides of datetime/str comparison (9692)
- support deserializing struct json into df (9688)
- add tree formatter for expressions (9684)
- streamline `adbc` connectivity, adding snowflake support (9600)
- improve `selector` utility functions with better docstrings/examples (9683)
- add `.list.any()` and `.list.all()` (9573)
- extend dtype/selector matching for `Datetime` with a "\*" wildcard for timezones (9641)
- add symmetric difference to list set operations (9655)
- Pass through stdin/stderr buffer in to\_csv (9624)
- add dt.base\_utc\_offset (9636)
- add dt.dst\_offset feature (9629)
- allow to specify index order in `to_numpy` (9592)
- accept expressions in `repeat` (9614)
- set operations for list (9599)
- make LazyFrame.map pickle (9597)
- add a new `rows_by_key` method, returning a keyed-dictionary of row data (9567)
- implement apply object -> struct (9578)

๐Ÿž Bug fixes

- don't clear rev\_map when categorical series is cleโ€ฆ (9720)
- fix(rust, python) improve glob pattern testing (9721)
- don't run hstack checks when using cached names (9709)
- fix result dtype in date\_range(..., eager=True) if duration contains "1s1d" (9670)
- increment seed between samples (9694)
- fix cse\_plan invalid projection removal (9700)
- fix ne\_missing for booleans vs lit (9693)
- raise if to\_datetime would have parsed input incorrectly (9675)
- respect time\_zone in lazy date\_range (8591)
- Align dependency versions (9661)
- redo weighted rolling var (9609)
- Correct weighted rolling quantile definition (9608)
- clear hashes buffer in generic streaming joins (9612)
- stable list namespace ouput when all elements are โ€ฆ (9610)
- address schema edge-case with scalar-expanded data that resolves to an empty frame (9593)
- handle dictionary init with unsized iterators that also hits the scalar-expansion fast path (9594)
- validate time zone in cast and from\_arrow operations (9598)
- ensure `from_dicts` drops columns explicitly omitted from schema (9581)
- fix join suffix collision (9579)
- fix sum consistency (9576)
- fix take of array dtype (9575)
- fix predicate pushdown case before sort (9574)
- fix lazy schema of temporal\_range functions when no alias is provided (9543)
- fix join validation when swapped (9534)

๐Ÿ› ๏ธ Other improvements

- More cleanup for `arange` (9681)
- Fix some more type hints (9716)
- Added trivial examples for the aggregation of columns in groupby (9708)
- Fix some type hints (9695)
- additional ADBC examples and docstring information for `read_database` (inc snowflake) (9686)
- drop Python 3.7 support (9679)
- improve `selector` utility functions with better docstrings/examples (9683)
- refactor `arange` and add `int_range`/`int_ranges` (9666)
- Clarify Dataframe.corr operates on columns (9678)
- remove false "eager=True" from date\_range tests (9663)
- Add examples to .merge\_sorted (9664)
- bump maturin from 1.0.1 to 1.1.0 in /py-polars (9646)
- remove deprecation warning of already-enforced valid timezones change (9639)
- fix failing ci test (9638)
- fix inconsistency in `.list.difference()` example (9615)
- Clean up doctests for rolling (9626)
- fix faulty test of `to_numpy` (9619)
- examples for `.list.union()`, `.list.difference()`, `.list.intersection()` (9602)
- fix see also broken links (9607)
- clarify sortedness condition of groupby\_dynamic and groupby\_rolling (9606)
- clean up inconsistencies in duration string language (9551)
- Adding examples to binary functions (9553)
- Minor cleanup of `arange` (9544)
- Remove outdated badges from README (9532)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, borchero, datapythonista, dependabot, dependabot[bot], eitsupi, guanqun, jeroenjanssens, jorisSchaller, kljensen, magarick, mcrumiller, messense, mishpat, moritzwilksch, ritchie46, stinodego, ttencate, universalmind303 and zundertj


py-0.18.4
๐Ÿš€ Performance improvements

- don't initialize memory before row-encoding (9435)
- further optimize datetime conversion (9452)
- speedup datetime conversion (9432)
- reduce page faults in q1 `~-30%` (9423)
- reduce rayon/idle time in streaming (9416)

โœจ Enhancements

- add drop\_first parameter for to\_dummies (issue 8246) (9143)
- raise if window size in rolling functions isn't strictly positive (9465)
- serializable python functions in expressions (9462)
- add infer schema len to json\_extract (9478)
- Adds (Most) Remaining Trig Functions to `SQLContext` (9453)
- update error handling msg for sql functions (9474)
- Update LazyFrame.\_\_repr\_\_ (9460)
- support inversion of `first` \& `last` selectors, additional minor repr improvements (9456)
- add str.titlecase (9457)
- raise if period is negative in groupby\_rolling (9445)
- enhanced `polars.selectors` repr and implicit application of `as_expr` when broadcasting (9450)
- add SQL `round` support (9330)
- dont error for time-zone-aware parsing if time zone is UTC (9414)

๐Ÿž Bug fixes

- ensure that trying to use a string as a dtype raises a consistent error on both DataFrame and Series init (9493)
- fix race condition in out-of-core sort (9521)
- unset sortedness for local date and local datetime (9515)
- maintain sortedness flags on append/extend (9496)
- fix serde for small integer dtypes (9495)
- raise if window size in rolling functions isn't strictly positive (9465)
- Fix empty list or Series selections on DF or Series (8660)
- groupby rolling with negative offset (9428)
- pl.lit with datetime was producing slightly incorrect results (9438)
- read\_csv was parsing dates incorrectly when the dtype was overridden (9420)
- Compute Spearman rank correlations using average raโ€ฆ (9415)
- Fix rolling min/max when window is empty (9406)

๐Ÿ› ๏ธ Other improvements

- don't pickle pyarrow dataset (9523)
- fix rendering of examples (9482)
- Warn for future change of closed default value in rolling functions (9470)
- Document aggregate\_function=None in pivot (9473)
- Docstrings for expressions and dtypes (9351)
- fix typo in rolling\_\* docstrings (9449)
- Deprecate some expr input parsing behavior (9455)
- improve date-range docs (9451)
- Improve docstrings rolling functions (9215)
- Remove \_tempdir module references (9427)
- fix typo in `Series.qcut` (9421)
- Add some documentation on the CI workflows (9404)

Thank you to all our contributors for making this release possible!
EdmundsEcho, MarcoGorelli, SeanTroyUWO, alexander-beedie, baggiponte, braaannigan, datapythonista, magarick, mcrumiller, messense, mgperry, mishpat, ritchie46, stinodego, tarrafil, universalmind303 and zundertj


py-0.18.3
๐Ÿš€ Performance improvements

- use row format in streaming join `~15%` (9379)
- row encode buffer reuse (9371)
- bytes row format for streaming groupby/unique keys `>3.5x` (9346)
- push slices down map functions (9350)

โœจ Enhancements

- support all numeric dtypes in serde (9393)
- allow easy load/save of polars `Config` options to/from file (9391)
- ensure part of the plan is streaming if aggregatiโ€ฆ (9387)
- add relaxed concatenation (9382)
- add sql DROP TABLE (9355)
- support ternary expressions in streaming (9343)
- add SQL support for null-aware equality checks (9332)
- add SQL support for regular expression operators (`~`, `!~`, `~*`, and `!~*`) (9327)
- support `//` integer floordiv operator in the SQL engine (9324)

๐Ÿž Bug fixes

- fix bug when comparing series (9359)
- list zip with (9367)
- parquet + categorical (9363)
- respect startby in groupby\_dynamic when every is greater than 1d (9362)
- raise groupby apply on empty frame (9360)
- raise more informative error on string arguments (9352)
- Allow for tolerance when comparing nested dtype columns (9272)
- avoid `is_in` TypeError with sets of values containing 'None' (9323)

๐Ÿ› ๏ธ Other improvements

- add top-k test for 9385 (9388)
- document apply 'return\_dtype' requirement (9361)
- clarify when day of week takes effect in groupby\_dynamic (9342)
- add "if you're coming from pandas" tip to groupby\_dynamic (9336)
- fix string language formatting (9341)
- add doc entries for `eq_missing` and `ne_missing` expressions (9331)
- fixup options for `validate` arg in `join` (9319)

Thank you to all our contributors for making this release possible!
0xbe7a, AnatolyBuga, MarcoGorelli, alexander-beedie, dkrako, durandtibo, ritchie46 and universalmind303


py-0.18.2
๐Ÿš€ Performance improvements

- increase streaming groupby spill size from 256 to 10\_000 (9312)
- perf(rust, python) Improve rolling min and max for nonulls (9277)

โœจ Enhancements

- allow use of `StringCache` object as a function decorator (9309)
- allow use of `Config` object as a function decorator (9307)
- serde for 'to\_physical' expr (9294)

๐Ÿž Bug fixes

- fix rolling weighted mean (9292)
- fix overly-broad string matching in selectors (9303)
- fix when loading model data from upcoming `pydantic` 2.x release (9296)

๐Ÿ› ๏ธ Other improvements

- fix extraneous indent in examples block (9297)
- Fix typo in Selectors documentation (9295)

Thank you to all our contributors for making this release possible!
alexander-beedie, magarick, ritchie46, stinodego and thomascamminady


py-0.18.1
๐Ÿ† Highlights

- add dedicated `selectors` module, consolidating/expanding existing selector capabilities (9204)

๐Ÿš€ Performance improvements

- slightly improve n\_unique performance (9286)
- use ciborium in Expression pickling (9235)

โœจ Enhancements

- add join cardinality validation (9278)
- implement set operations for selector API (9276)
- keep sorted flag after Expr::truncate (9275)
- add "sql\_expr" function (9248)
- rewrite correlation functions to expression architecture (9258)
- keep sorted flag on `offset_by` (9253)
- add expression json serde (9236)
- add intersection primitive for selector API (9240)
- building blocks for expression expansion sets (9231)
- Add ddof option to rolling\_var and rolling\_std (8957)
- immediately flatten nested unions (9220)
- Allow empty `select`/`with_columns`/`groupby` (9205)
- add a `datetime` selector (9212)
- support float expression on integers (9210)
- add dedicated `selectors` module, consolidating/expanding existing selector capabilities (9204)
- add binary to list\<u8> cast (9161)
- groupby\_dynamic by quarter. (6842)
- add arr.unique expression (9159)
- implement explode for DataType::Array (9157)
- `Decimal` type: `sum`, `min`, `max` aggregations in `select` and `agg` context. (9135)
- Decimal arithmetic (9123)
- support decimals as cast types in csv parser (9121)
- Improve error handling for `repeat` (9117)

๐Ÿž Bug fixes

- fix pyarrow dataset literal filter (9274)
- raise on invalid sort\_by (9262)
- match missing Array and Struct classes in FromPyObject (9271)
- correct ne/e\_missing schema (9257)
- fix cached reproject offsets (9254)
- delay opening files in streaming engine (9251)
- ensure agg(F(lit)) == lit (9222)
- don't SO on concat(expressions) (9214)
- df.apply first rechunk (9211)
- clip window\_size to length in rolling\_apply (9209)
- raise error on invalid df.apply return (9207)
- Handle edge cases of named `select` input (9198)
- rolling\_apply window\_size == len (9181)
- respect time zone in strptime/to\_datetime when exact=False (9171)
- make null chunking behavior equal to other dtypes (9176)
- return single numpy array in Array dtype -> numpy (9164)
- fix regression in boolean nulls comparison (9142)
- fix struct null\_count if fields are null arrays (9151)
- Fix DataFrame.to\_arrow() for 0x0 dataframes (9144)
- categorical construction from null values (9145)
- let `apply` caller determine if length needs to be checked. (9140)
- struct `is_in` should upcast numeric types (9110)
- Restore functionality of `name` arg for `date_range` (9107)
- bubble up dtype when converting from arrow (9120)

๐Ÿ› ๏ธ Other improvements

- Fix grammar and add periods in `Expr.over` docs (9244)
- Update linting for `py-polars` crate (9242)
- Deprecate `exprs=...` input for `select`/`with_columns`/`agg`/`struct` (9219)
- Enable parallelization in Python Windows tests (9232)
- Use pytest `tmp_path` (9206)
- Build docs in parallel (9229)
- Unify Python docs workflows (9228)
- add docstring to \_\_array\_\_ methods (8055)
- Update expr parsing util to return `PyExpr` (9166)
- update pyo3 requirement from 0.18 to 0.19 (9155)
- clarify how the windows are formed in the rolling\_\* functions (9192)
- stabilise polars importtime check (9196)
- fix "to\_decimal" docstring (9197)
- note that `exact=False` is a performance footgun (9186)
- change decimal inference and argument order (9133)
- Cache Rust build on main branch (9130)
- Improve df.clear() docs (8809)
- Bump `maturin` to `1.0.1` (9115)
- Bump lint dependency versions (9116)

Thank you to all our contributors for making this release possible!
DeflateAwning, MarcoGorelli, alexander-beedie, ankane, avimallu, bfeif, dependabot, dependabot[bot], jonashaag, josh, lorentzenchr, magarick, ritchie46, stinodego, universalmind303 and zundertj


py-0.18.0
๐Ÿ† Highlights

- Rename list namespace accesor from `.arr` to `.list` (8999)

โš ๏ธ Breaking changes

- propagate null in equality comparisons (9053)
- formalize implode -> explode relation (9038)
- Drop subclassing support for `DataFrame`/`LazyFrame` (9008)
- consistently return list of date/datetime from lazy date\_range (8513)
- Default `date_range`/`ones`/`zeros` to `eager=False` (9007)
- Rename list namespace accesor from `.arr` to `.list` (8999)
- disallow time zones other than those in zoneinfo.available\_timezones() (8993)
- remove window expression magic (8992)
- raise error when sorted flag not set (8994)
- Drop subclassing support for GroupBy (7746)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (8881)
- parse offset-naive date time strings as Timestamp(time\_unit), offset-aware datetime strings as Timestamp(time\_unit, "UTC"), and remove the utc argument (8714)
- Remove deprecated tz\_aware argument (8696)

๐Ÿš€ Performance improvements

- speed up write\_csv for time-zone-aware columns (9093)
- parallelize rolling\_window group materialization (9095)
- elide hot loop in hash joins (9075)

โœจ Enhancements

- conversion from `Utf8` to `Decimal`. (9090)
- default to checking sortedness in groupby\_rollingโ€ฆ (9063)
- propagate null in equality comparisons (9053)
- warn if constructing Series with time-zone-aware datetimes (9058)
- implement apply for rolling/dynamic\_groupby (9049)
- Support more data types in lazy `repeat` (9046)
- implement strategy=nearest for join\_asof (9024)
- arr.sum expression (9041)
- formalize implode -> explode relation (9038)
- add array namespace and min/max expression (9032)
- improve error message on row-wise overflow (9021)
- properly apply slice at UNION level (9018)
- consistently return list of date/datetime from lazy date\_range (8513)
- Default `date_range`/`ones`/`zeros` to `eager=False` (9007)
- disallow time zones other than those in zoneinfo.available\_timezones() (8993)
- raise error when sorted flag not set (8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (8881)
- parse offset-naive date time strings as Timestamp(time\_unit), offset-aware datetime strings as Timestamp(time\_unit, "UTC"), and remove the utc argument (8714)

๐Ÿž Bug fixes

- rolling\_groupy was returning incorrect results when offset was positive (9082)
- don't underflow on list.tail (9089)
- fix null/empty in List::take\_unchecked (9074)
- repeat by (9023)
- raise in to\_datetime/strptime if format contains hour but not minute directive (9044)
- Order of pl.Array arguments in docstring (9059)
- propagate nulls in broadcasting of order comparisons (9050)
- Improve read\_parquet missing column error message (8961)
- fix apply with passed date/datetime return\_dtype (9035)
- respect inner type in Array construction (9020)
- raise error on invalid aggregation (9013)
- fix fused arithmetic in window functions (9012)
- don't allow silent init of `Series` declared as int/temporal with floating point values (9004)
- deprecate `time_unit` property from `Series` (8990)

๐Ÿ› ๏ธ Other improvements

- Improve expression parsing utils (9094)
- Refactor expression input parsing util (9085)
- Organize "as\_datatype" functions (9080)
- Change eager path for `repeat` (9048)
- Clean up `arange`/`date_range`/`time_range` (9027)
- Drop subclassing support for `DataFrame`/`LazyFrame` (9008)
- minor `SQLContext` docstring cleanups (9005)
- Rename list namespace accesor from `.arr` to `.list` (8999)
- remove window expression magic (8992)
- Drop subclassing support for GroupBy (7746)
- refactor!(python): Remove old deprecated functionality (8995)
- Remove deprecated tz\_aware argument (8696)

Thank you to all our contributors for making this release possible!
CloseChoice, MarcoGorelli, alexander-beedie, charliegallop, jonashaag, mcrumiller, raymead, ritchie46, sorhawell, stinodego, tim-habitat and universalmind303


rs-0.30.0
๐Ÿ† Highlights

- Rename list namespace accesor from `.arr` to `.list` (8999)
- `Array` (backed by `arrow::FixedSizeList` datatype (8943)

โš ๏ธ Breaking changes

- propagate null in equality comparisons (9053)
- formalize implode -> explode relation (9038)
- consistently return list of date/datetime from lazy date\_range (8513)
- Rename list namespace accesor from `.arr` to `.list` (8999)
- disallow time zones other than those in zoneinfo.available\_timezones() (8993)
- remove window expression magic (8992)
- raise error when sorted flag not set (8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (8881)
- parse offset-naive date time strings as Timestamp(time\_unit), offset-aware datetime strings as Timestamp(time\_unit, "UTC"), and remove the utc argument (8714)
- Remove deprecated tz\_aware argument (8696)

๐Ÿš€ Performance improvements

- speed up write\_csv for time-zone-aware columns (9093)
- parallelize rolling\_window group materialization (9095)
- elide hot loop in hash joins (9075)
- improve list explode perf (8974)
- Improve explodes: `offsets_to_indexes` performance (8964)
- avoid quadratic `exclude` behaviour when selecting against dtypes and/or wildcards (8953)
- use simd-json for all json parsing (8922)
- improve `json_extract` (8858)
- add optimizer passes and change initial order (8811)
- fused multiply sub / sub multiply (8799)
- improve parallel work distribution of sort expression `~4x` (8775)
- change default row-group size (8758)

โœจ Enhancements

- conversion from `Utf8` to `Decimal`. (9090)
- default to checking sortedness in groupby\_rollingโ€ฆ (9063)
- propagate null in equality comparisons (9053)
- implement apply for rolling/dynamic\_groupby (9049)
- implement strategy=nearest for join\_asof (9024)
- arr.sum expression (9041)
- formalize implode -> explode relation (9038)
- add array namespace and min/max expression (9032)
- improve error message on row-wise overflow (9021)
- properly apply slice at UNION level (9018)
- consistently return list of date/datetime from lazy date\_range (8513)
- disallow time zones other than those in zoneinfo.available\_timezones() (8993)
- raise error when sorted flag not set (8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (8881)
- parse offset-naive date time strings as Timestamp(time\_unit), offset-aware datetime strings as Timestamp(time\_unit, "UTC"), and remove the utc argument (8714)
- error on invalid sortby expr (8986)
- Pushdown `is_in` to pyarrow dataset (8930)
- `Array` (backed by `arrow::FixedSizeList` datatype (8943)
- multiple enhancements for `SQLContext` (8944)
- add sql UNION, UNION ALL \& UNION DISTINCT (8936)
- add sql compound identifiers (8934)
- add sql EXCLUDE (8913)
- add sql CASE (8911)
- add sql EXPLAIN (8897)
- improve `json_extract` (8858)
- add support for sql DISTINCT ON (8824)
- add LazyFrame `null_count` (8837)
- check categorical cache on transpose (8836)
- add support for `OFFSET` keyword in SQL queries (8833)
- add a new `time_range` utility function (8776)
- Add hint to use \_saturating on overflow (8805)
- support boolean addition (8778)
- improved detail in several error messages (8747)

๐Ÿž Bug fixes

- rolling\_groupy was returning incorrect results when offset was positive (9082)
- fix null/empty in List::take\_unchecked (9074)
- repeat by (9023)
- raise in to\_datetime/strptime if format contains hour but not minute directive (9044)
- propagate nulls in broadcasting of order comparisons (9050)
- fix apply with passed date/datetime return\_dtype (9035)
- raise error on invalid aggregation (9013)
- fix fused arithmetic in window functions (9012)
- JoinBuilder::force\_parallel is modifying allow\_parallel (8617)
- Fix erroneous warning in `hist` (8982)
- respect rechunk in parquet (8935)
- Simplify offsets\_to\_indexes, fix empty offsets edge cases (8920)
- sql qualified wildcards (8916)
- don't check sortedness in asof by (8906)
- check for object type in csv writer (8894)
- window function with filtered groups (8880)
- parse offset-aware strings as UTC in read\_csv when try\_parse\_dates=True (8864)
- free buffer, but not its contents (8848)
- improve agg expr field types (8834)
- sql `BETWEEN` bounds should be inclusive (8818)
- sort cached window groups (8813)
- check null data before take (8812)
- fix broadcasting on integer bitwise (8798)
- correct aggregation of overlapping groups (8794)
- modify join error (8768)
- don't parallelize sort within rayon job (8774)
- fix deadlock in cache and improve parallelism/workโ€ฆ (8765)
- check offset before doing owned mutation (8760)
- validate data on successful deserialization (8757)
- improve supertype coercion of functions (8755)

๐Ÿ› ๏ธ Other improvements

- use concrete type for time zones (9076)
- factor add\_month out of add\_impl\_month\_week\_or\_day (9066)
- remove unnecessary timezone trait usage, use concrete type (9065)
- Fix broken links (9072)
- bump sqlparser version (9043)
- move list namespace functions to seperate module (9040)
- Clean up `arange`/`date_range`/`time_range` (9027)
- Rename list namespace accesor from `.arr` to `.list` (8999)
- replace pattern match with unwrap (9000)
- remove window expression magic (8992)
- Remove deprecated tz\_aware argument (8696)
- simplify `take_every` (8971)
- add readmes to all sub crates (8770)
- refactor(rust); improve arithmetic reuse and don't allocate on binaryโ€ฆ (8781)
- accumulate windows flag during translation (8773)

Thank you to all our contributors for making this release possible!
CloseChoice, MarcoGorelli, alexander-beedie, avimallu, cbowdon, charliegallop, chitralverma, jonashaag, kpberry, mcrumiller, petar-savov, raymead, ritchie46, sorhawell, stinodego, tim-habitat, uchiiii and universalmind303


py-0.17.15
๐Ÿ† Highlights

- `Array` (backed by `arrow::FixedSizeList` datatype (8943)
- Write dataframes as delta tables (7616)

๐Ÿš€ Performance improvements

- improve list explode perf (8974)
- Improve explodes: `offsets_to_indexes` performance (8964)
- avoid quadratic `exclude` behaviour when selecting against dtypes and/or wildcards (8953)
- use simd-json for all json parsing (8922)
- improve performance of `align_frames`, and add new alignment option (8899)

โœจ Enhancements

- error on invalid sortby expr (8986)
- Pushdown `is_in` to pyarrow dataset (8930)
- allow set column list input to 'drop' and 'drop\_nulls' (8962)
- `Array` (backed by `arrow::FixedSizeList` datatype (8943)
- Add `dtype` argument for `repeat` (8946)
- Use schema keys to define the columns if only the schema is provided to `pl.struct` (8952)
- multiple enhancements for `SQLContext` (8944)
- add sql UNION, UNION ALL \& UNION DISTINCT (8936)
- add sql compound identifiers (8934)
- add sql EXCLUDE (8913)
- add sql CASE (8911)
- add sql EXPLAIN (8897)
- Write dataframes as delta tables (7616)
- improve performance of `align_frames`, and add new alignment option (8899)
- improved inference from type annotations (8895)

๐Ÿž Bug fixes

- Fix erroneous warning in `hist` (8982)
- don't modify `Series` with empty names in-place on `DataFrame` init (8956)
- respect rechunk in parquet (8935)
- Add hint on PyArrow to ADBC import error (8898)
- Simplify offsets\_to\_indexes, fix empty offsets edge cases (8920)
- sql qualified wildcards (8916)
- address edge cases with in-place modification of `Series` objects (8915)
- don't check sortedness in asof by (8906)
- check for object type in csv writer (8894)
- improve performance of `align_frames`, and add new alignment option (8899)
- window function with filtered groups (8880)

๐Ÿ› ๏ธ Other improvements

- deprecate `rename` "in\_place" parameter (8960)
- Clean up tests for `repeat` (8979)
- Deprecate `name` argument for `repeat` (8977)
- simplify `take_every` (8971)
- Clean up `repeat`/`ones`/`zeros` (8963)
- further enhance `SQLContext` docstrings (8948)
- docs(python) Fix typo in `lazygroupby.rs` error message (8937)
- fix docstring for `time()` (8939)
- refactor tzinfo-related tests (8883)

Thank you to all our contributors for making this release possible!
CloseChoice, MarcoGorelli, alexander-beedie, avimallu, cbowdon, chitralverma, jonashaag, kpberry, mcrumiller, petar-savov, ritchie46, stinodego and universalmind303


py-0.17.14
๐Ÿš€ Performance improvements

- optimise `align_frames` and properly handle the case where the alignment key has duplicate values (8825)

โœจ Enhancements

- add an `align` option to `pl.concat` (8835)
- add support for sql DISTINCT ON (8824)
- add LazyFrame `null_count` (8837)
- check categorical cache on transpose (8836)
- add support for `OFFSET` keyword in SQL queries (8833)
- optimise `align_frames` and properly handle the case where the alignment key has duplicate values (8825)

๐Ÿž Bug fixes

- parse offset-aware strings as UTC in read\_csv when try\_parse\_dates=True (8864)
- handle `InitVar` typing declarations on `dataclass` objects (8856)
- free buffer, but not its contents (8848)
- improve agg expr field types (8834)
- optimise `align_frames` and properly handle the case where the alignment key has duplicate values (8825)
- sql `BETWEEN` bounds should be inclusive (8818)

๐Ÿ› ๏ธ Other improvements

- add examples for `Config` "set\_tbl\_formatting" and "set\_fmt\_str\_lengths" methods (8859)
- Convert between Vec of Series/Pyseries using trait (8846)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ritchie46, stinodego and universalmind303


py-0.17.13
๐Ÿš€ Performance improvements

- add optimizer passes and change initial order (8811)
- fused multiply sub / sub multiply (8799)
- improve parallel work distribution of sort expression `~4x` (8775)
- change default row-group size (8758)
- elide function calls in AnyValue::eq (8725)

โœจ Enhancements

- add a new `time_range` utility function (8776)
- Add hint to use \_saturating on overflow (8805)
- add a "restore\_defaults" kwarg to `Config` init (8797)
- add lazy `time` expression (8785)
- support boolean addition (8778)
- support `SQLContext` registration of `DataFrames` (8762)
- support automatic `SQLContext` frame/table registration from local variables (8749)
- improved detail in several error messages (8747)
- support frame registration at `SQLContext` init time, and add an "unregister" method (8744)
- support repeat for all types (8741)
- add support for `DISTINCT` keyword in SQL select clauses (8740)
- support any day of the week in 'start\_by' in groupby\_dynamic (8720)
- add support for `USING` clause in SQL join operations (8731)
- add unit tests for `extend_constant` Expr (8734)
- add clean multi-frame registration to `SQLContext` (8724)
- add support for `HAVING` clause to SQL `GROUP BY` operations (8704)
- improved `numpy` string interop (8703)

๐Ÿž Bug fixes

- sort cached window groups (8813)
- check null data before take (8812)
- fix broadcasting on integer bitwise (8798)
- Fix incorrect type hint for `arange` (8796)
- correct aggregation of overlapping groups (8794)
- don't parallelize sort within rayon job (8774)
- fix deadlock in cache and improve parallelism/workโ€ฆ (8765)
- check offset before doing owned mutation (8760)
- don't persist temporary column in disjoint calls to `update` (8763)
- validate data on successful deserialization (8757)
- improve supertype coercion of functions (8755)
- groupby\_dynamic was unnecessarily failing on ambiguous local datetime (8737)
- ensure count aggregation has proper length when spilling (8735)
- fix return value of std for single-element sequence with ddof=1 (8730)
- don't take logical plan during streaming fmt (8711)
- Don't upcast in round() for f32 when decimal is 0 (8706)

๐Ÿ› ๏ธ Other improvements

- add entry for lazy `time` func (8786)
- add unit tests for `extend_constant` Expr (8734)
- add rounding coverage for 32/64 bit floats (8715)
- Add warning to count methods on null (8698)

Thank you to all our contributors for making this release possible!
DeflateAwning, MarcoGorelli, alexander-beedie, mcrumiller, ritchie46, stinodego, uchiiii, universalmind303 and zundertj


rs-0.29.0
๐Ÿ† Highlights

- Out-of-core unique (8573)

โš ๏ธ Breaking changes

- Rename `concat_lst` to `concat_list` (8597)
- Schema improvements (8286)
- don't create duplicate pivot names (8002)
- rename `toggle_string_cache` to `enable_string_cache` (7970)
- change top\_k(descending) -> bottom\_k (7969)
- in `sort`, `top_k`, `sort_by`, and `arg_sort_by`, raise if `descending` is a sequence and its length doesn't match the number of columns to sort by (7957)

๐Ÿš€ Performance improvements

- elide function calls in AnyValue::eq (8725)
- add fused multiply add optimization for expressions (8690)
- use expression for dot product (8686)
- improve nested grouptuples related code (8618)
- buffer spill partitions in ooc sort. `~10/20%` (8616)
- improve OOC sort performance during partition phase (8590)
- remove some unnecessary calls and matches (8490)
- less naive count (8473)
- parallelize almost all flattens (8468)
- optimize horizontal min/max (8463)
- reinstate old behavior in numeric group-tuples (8445)
- remove false sharing in perfect hash table `>2x` (8432)
- further optimised conversions to python date/datetime (8417)
- optimize join inner materialization of single keys (8405)
- parallelize sorted group tuple materialization (8387)
- improve materialization of huge cardinality group tuples (8382)
- improve group\_tuples materialization (8375)
- use online variance kernel for aggregation (8306)
- add specialized boolean aggregation for min/max (8294)
- fail fast on non-inferable strings in strptime if no `fmt` is provided (8111)
- make chunks search more resilient (8229)
- SIMD accelerated `arg_min`/`arg_max` (via `argminmax`) (8074)
- speed up csv parsing for slower datetimes formats (8213)
- `arr.eval` run on groupby expression engine when possible (8199)
- `FromParalleIter<Option<str>> for Utf8Chunked` `~1.9x` (8058)
- speed up from\_par\_iter Option\<bool> `~2.5x` (8057)
- parallelize numeric ChunkedArray materialization `~2x`. (8053)
- parallelize `into_groups` materialization ~`-25%` (8036)
- use a trusted anyvalue builder (8001)
- numeric grouptuples with nulls hash in single pass `~25%` (7980)
- use perfect hash table for categoricals (7951)
- improve group\_tuples of high cardinality data `~10%` (7938)
- use streaming instead of partitioned groupby (7907)
- don't auto-stream groupby (7906)
- rechunk before aggs (7903)
- don't re-allocate groups in sorted to\_dummies (7897)

โœจ Enhancements

- add support for `DISTINCT` keyword in SQL select clauses (8740)
- support any day of the week in 'start\_by' in groupby\_dynamic (8720)
- add support for `USING` clause in SQL join operations (8731)
- add support for `HAVING` clause to SQL `GROUP BY` operations (8704)
- streaming unions (8676)
- expression cache (8674)
- rolling covariance and correlation (8671)
- Add `dt.to_string` alias for `dt.strftime` (8290)
- use temp dir for ooc spills (8614)
- make ooc-sort resilient against chunk\_size (8588)
- Set `strptime` default `strict/exact=true` (8587)
- Out-of-core unique (8573)
- Add `to_date`, `to_datetime`, `to_time` to String namespace (8579)
- more detailed error message on failure to cast `List` dtype (8583)
- don't trigger unreachable code if no dtype is set (8532)
- accept expressions in `groupby_dynamic/rolling` (8528)
- expose quantile/mean for duration (8491)
- require explicitly sorted flag for upsample (8488)
- allow for \_saturating suffix in duration strings (8479)
- let duration string accept "1mo\_saturating" (8469)
- add dt.month\_start and dt.month\_end (8435)
- add SQL support for cumulative functions (8457)
- add `str_slice` method to `StringNameSpace` (8427)
- allow negative 'arange' expression (8413)
- warn if argument is not explicitly sorted (8409)
- Schema improvements (8286)
- add support for SQL "IN" expr (8396)
- cli output mode \& sql read\_json (8336)
- rename 'csv-file' to 'csv' (8101)
- preserve time zone in combine (8263)
- add `use_earliest` argument to `replace_time_zone` for dealing with ambiguous datetimes (8087)
- SQL CTE's (8208)
- add duration cumsum and remainder (8219)
- better algorithm for streaming unique (8003)
- Add approx distinct count via `approx_unique()` (7937)
- adopt `FunctionExpr` for `cat` namespace (8173)
- `DatetimeArgs` ergonomics (8133)
- Remove Seek constraint from IpcStreamReader and SerReader (8166)
- implement `FunctionExpr` for bound and round methods (8172)
- display skipped row if same number of rows (8170)
- move all boolean expressions into `BooleanFunction` enum (8132)
- rewrite log expressions to make them serializable (8126)
- make unique expr serde and cmp (8153)
- adopt `FunctionExpr` for `abs` to allow for serialization (8129)
- adopt `FunctionExpr` for `cum*` functions (8130)
- support negative index in `pct_change` (8137)
- add `log1p` to list of mathematical functions (8102)
- expand list of tz-aware formats which can be auto-inferred (8085)
- clearer error message if strptime without a fmt specified fails (8086)
- infer tz-aware formats with try\_parse\_dates in read\_csv (8084)
- feat(python, rust)! make 'mo' interval raise if the target date does not exist (8078)
- auto-infer fmt for tz-aware date strings (7405)
- multiple sql contexts \& optional sql highlighting in cli (8072)
- implement arg\_sort for struct dtype (8051)
- support struct in df.unique (7976)
- change top\_k(descending) -> bottom\_k (7969)
- optimize away nested unions in lp (7861)
- Add seed argument to rank for random (7913)
- auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz\_aware argument (7886)
- deal with null values in cut/qcut (7878)
- support datetime/date subclasses (e.g. FreezeGun) (7819)

๐Ÿž Bug fixes

- groupby\_dynamic was unnecessarily failing on ambiguous local datetime (8737)
- ensure count aggregation has proper length when spilling (8735)
- fix return value of std for single-element sequence with ddof=1 (8730)
- don't take logical plan during streaming fmt (8711)
- Don't upcast in round() for f32 when decimal is 0 (8706)
- block predicate containing shifts and windows after sort (8670)
- ensure perfect hash table processes the nulls (8668)
- Reading more tiny CSVs than workers in parallel will deadlock (8441)
- respect maintain\_order in partitioned groupby (8653)
- fix explode null series (8654)
- fix categorical agg type (8645)
- allow list\<null> -> list\<cat> (8636)
- maintain sorted info on top-k and empty sort (8615)
- maintain sortedness in date -> datetime cast (8606)
- fix determining of supertype for tz-aware and tz-naive datetimes (8585)
- fix csv reader with new line in header (8580)
- correct for nested offsets in json serialization (8584)
- fix wrong dtype init in streaming groupby (8574)
- fix categorical/string\_cache fill\_null panic (8562)
- fix window function contention in binary expression (8544)
- fix StructChunked `not_equal` comparator/operator (8547)
- fix struct pyarrow ffi (8543)
- don't trigger unreachable code if no dtype is set (8532)
- keep sorted info on agg\_first and simple singletonโ€ฆ (8526)
- unset fast\_unique coming from arrow (8521)
- correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes 8423) (8508)
- don't error on cast if column is not projected (8495)
- ensure window function succeeds on empty frame (8492)
- don't set verbose on union (8487)
- check literal/group length before claiming agg staโ€ฆ (8486)
- fix error message of offset\_by if offsetting by negative number of months (8464)
- fix sorted warning (8462)
- fix features serde and dtype-struct not compiling together (8439)
- respect dtype in anonymous list builder in case ofโ€ฆ (8428)
- infer supertype in json serde (8411)
- duration on empty df (8403)
- don't inadvertently set `Series` initialised with nested tuple data as `Object` dtype (8401)
- use physical in streaming unique global table (8390)
- recursively bubble up all dtypes in list cast (8386)
- is\_in struct logical types (8378)
- fix nested null parquet read (8372)
- fix logical type in ListChunked::new\_from\_index (8367)
- bubble up logical type in recursive list cast (8356)
- implement clone\_inner for all series (8357)
- fix fill\_null for categorical (8353)
- time.cast(str) as strftime (8351)
- fix logical dtypes in parallel list collection (8349)
- improve logical types of explode operation (8348)
- logical type in anonymous list builders (8346)
- escape csv header names if they contain special chars (8331)
- nested struct/list/categorical logical/physical (8334)
- fix deserialize empty list (8326)
- fix coalesce schema (8324)
- don't do null propagation (8322)
- ensure invalid list eval raises (8317)
- pass name to struct construction in aggregation (8299)
- Use three slashes for doc comments (8284)
- improve nested list construction (8278)
- Fix DataFrame.sum returning empty column names (8283)
- always sort in `top_k` fast path (8275)
- don't use fast paths for sorted join if there are โ€ฆ (8272)
- fix boolean par materialization (8257)
- improve null/empty list construction (8255)
- fix offsets in parallel utf8 materialization (8254)
- nested struct logical type consistency (8249)
- keep literal state if elementwise function is applied (8195)
- decimal ensure backed arrow arrays have correct dtype (8193)
- ensure cached nodes are initialized once (8103)
- validate `map` lenghts (8147)
- fix row-wise init of `UInt64` values that exceed `Int64` upper bound (8146)
- implement list\<null> constructor (8143)
- add all primitives to av\_buffer builder (8140)
- struct `is_in` (8139)
- fix wrong display name of binary expressions (8131)
- lazy: fix boolean sum schema (8108)
- don't exponentially grow error messages (partial fix). (8081)
- check element count in multi-column explode (8050)
- set lower limit for chunk\_size (8048)
- impl to\_static for struct (8037)
- all/any empty sets (8012)
- struct null\_count, cast string, tranpose and describe (8009)
- fix pivot and transpose of struct data (8005)
- don't create duplicate pivot names (8002)
- fix chunked literals in expression engine (7973)
- in `sort`, `top_k`, `sort_by`, and `arg_sort_by`, raise if `descending` is a sequence and its length doesn't match the number of columns to sort by (7957)
- concat object types (7958)
- fix decimal conversion alignment (7954)
- Fix lazy encode schema (7912)
- respect skip\_nulls in apply for temporal types (7908)
- fix lit agg (7904)
- disable ooc groupby (7901)
- fix abs logical type (7895)
- fix boolean min/max output type and null handling (7894)
- validate groupby\_dynamic inputs (7876)
- correct for chunks in arg\_where (7873)
- fix nested logical/physical list (7872)
- fix arbitrary nested logical types (7869)
- don't use fxhash in sink\_sorted fast path (7849)
- parquet stats \& all kernel (7846)

๐Ÿ› ๏ธ Other improvements

- remove unnecessary feature flag requirement for start\_by=monday in groupby\_dynamic (8716)
- remove some branches (8688)
- streaming pipeline creation (8656)
- simplify replace\_time\_zone (8644)
- make slice attribute in UnionOptions consistent with โ€ฆ (8639)
- document the dispatcher (8637)
- Rename `concat_lst` to `concat_list` (8597)
- remove unreachable/duplicated code in get\_supertype (8592)
- change partition strategy (8561)
- remove some unnecessary calls and matches (8490)
- improve sorted warning/ fix tests (8484)
- bubble up time\_iter errors (8467)
- Minor update to `strptime` (8345)
- use `concat_owned_array_unchecked` when possible (8274)
- Rename `strptime`/`strftime` args (8221)
- change sampling ratio for groupby strategy (8223)
- Rename `Expr.list` to `implode` (8165)
- introduce `FieldsMapper` utility class for obtaining `FunctionExpr` schema (8175)
- don't panic on err in offset\_by (8210)
- remove unused list\_construction (8197)
- split dsl paragraph header (8162)
- feature flag guards (8117)
- use `map_private` where applicable to reduce code duplication (8128)
- remove unnecessary to\_string (8083)
- docs(rust) Add note about `-1` to show all rows. (8080)
- Fixed a bunch of clippy warnings (7967)
- rename `toggle_string_cache` to `enable_string_cache` (7970)
- Include license files in polars-error and polars-row crates (7930)
- quantile typo in qcut (7936)
- Improve `Duration::parse` docs (7918)
- improve shift and fill performance in case of periods >= ca.len() (7843)

Thank you to all our contributors for making this release possible!
DeflateAwning, JoonHong-Kim, LdRoW, MarcoGorelli, Newtoniano, StefanBRas, alexander-beedie, alonme, ankane, avimallu, ayemjay, borchero, cgevans, chitralverma, clickingbuttons, dependabot, dependabot[bot], ghuls, grantmcdermott, jonashaag, josh, jvdd, lorentzenchr, mcrumiller, mzjp2, n8henrie, pgimalac, rben01, ritchie46, stinodego, uchiiii, universalmind303, utkarshgupta137, zaynetro and zundertj


py-0.17.12
๐Ÿš€ Performance improvements

- add fused multiply add optimization for expressions (8690)
- use expression for dot product (8686)

โœจ Enhancements

- streaming unions (8676)
- allow `arr.to_struct` to take a list of field names, fix it for `Series`, improve related docstrings (8673)
- expression cache (8674)
- rolling covariance and correlation (8671)
- .to\_physical() for List(Categorical) (8499)
- allow `from_repr` to handle parsing of table reprs with no dtype row (8640)
- Add `dt.to_string` alias for `dt.strftime` (8290)
- support `DataFrame` export to `numpy` structured/record arrays (8628)
- support transparent `DataFrame` init from `numpy` structured/record arrays. (8620)
- Prettify show\_versions (8627)

๐Ÿž Bug fixes

- allow `arr.to_struct` to take a list of field names, fix it for `Series`, improve related docstrings (8673)
- block predicate containing shifts and windows after sort (8670)
- ensure perfect hash table processes the nulls (8668)
- Reading more tiny CSVs than workers in parallel will deadlock (8441)
- respect maintain\_order in partitioned groupby (8653)
- fix explode null series (8654)
- fix categorical agg type (8645)
- allow list\<null> -> list\<cat> (8636)

๐Ÿ› ๏ธ Other improvements

- add notes/examples on use of inline regex flags to `replace` docstrings (8685)
- Add "See Also" sections for alias, map\_alias, prefix, sโ€ฆ (8682)
- add notes/examples on use of inline regex flags to `extract_all` docstrings (8675)
- allow `arr.to_struct` to take a list of field names, fix it for `Series`, improve related docstrings (8673)
- add notes on the use of inline regex flags to `extract` docstrings (8669)
- Add missing `implode` to internal functions (8667)
- Clean up type checking imports (8666)
- Organize PySeries `impl` blocks (8665)
- clean-up some examples, extend `pipe` docstring (8658)
- add notes on the use of inline regex flags to `contains` docstrings (8657)
- fix/improve `from_repr` example/doctest (8642)
- Improve some bindings imports (8630)
- Move functions in Rust bindings to `functions` module (8629)
- only require `typing_extensions` before Python 3.8 (8623)
- Set up separate modules for lazy classes (8624)
- Remove duplicate util in Rust bindings (8622)
- Move Python version to env in release workflow (8621)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, dependabot, dependabot[bot], ghuls, jonashaag, josh, mcrumiller, ritchie46 and stinodego


py-0.17.11
๐Ÿš€ Performance improvements

- improve nested grouptuples related code (8618)
- buffer spill partitions in ooc sort. `~10/20%` (8616)
- avoid potentially redundant casts on `Series` init (8613)

โœจ Enhancements

- add `Expr.meta` namespace `eq` and `ne` methods (8599)
- avoid potentially redundant casts on `Series` init (8613)
- use temp dir for ooc spills (8614)
- add strict dtype equality comparison methods (`is_` and `is_not`) (8600)
- automatically convert `series <op> expr` to `pl.lit(series) <op> expr` (8549)

๐Ÿž Bug fixes

- maintain sorted info on top-k and empty sort (8615)
- fix ooc sort regression; don't take IO-thread before init (8607)
- maintain sortedness in date -> datetime cast (8606)

๐Ÿ› ๏ธ Other improvements

- document sortedness of return value of upsample (8612)
- Set up `functions` module in Rust bindings (8598)
- Split PyExpr `impl` block into modules (8596)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, dependabot, dependabot[bot], mcrumiller, ritchie46 and stinodego


py-0.17.10
๐Ÿ† Highlights

- Out-of-core unique (8573)

๐Ÿš€ Performance improvements

- improve OOC sort performance during partition phase (8590)
- significant speedup for python iteration over `Series` data (8501)

โœจ Enhancements

- make ooc-sort resilient against chunk\_size (8588)
- Out-of-core unique (8573)
- Add `to_date`, `to_datetime`, `to_time` to String namespace (8579)
- enhance parametric strategy retrieval, enable `List` strategy by default (8571)
- Add default value for `round` (8566)
- don't trigger unreachable code if no dtype is set (8532)
- Ergonomic inputs for `all`, `any`, `sum`, and `cumsum` (8541)
- accept expressions in `groupby_dynamic/rolling` (8528)
- add `is_nested` property to dtypes (8514)

๐Ÿž Bug fixes

- fix determining of supertype for tz-aware and tz-naive datetimes (8585)
- correct for nested offsets in json serialization (8584)
- fix wrong dtype init in streaming groupby (8574)
- fix edge-case with `NamedTuple` input that contains unhashable field data (8578)
- temporarily disable `List` dtype in parametric tests (8581)
- fix categorical/string\_cache fill\_null panic (8562)
- fix testing asserts for `NaN` values in `Struct` data (8557)
- fix window function contention in binary expression (8544)
- fix struct pyarrow ffi (8543)
- don't trigger unreachable code if no dtype is set (8532)
- fix testing asserts for `NaN` values in `List` data (8537)
- keep sorted info on agg\_first and simple singletonโ€ฆ (8526)
- don't downcast `Decimal` to `Float64` in truediv (8523)
- unset fast\_unique coming from arrow (8521)
- correct sign-reversed scale on DecimalChunked to Python Decimal conversion (fixes 8423) (8508)
- Clarify and fix behaviour in `pl.min/max` (8509)

๐Ÿ› ๏ธ Other improvements

- warn about changing date\_range default from lazy=False to eager=False (8593)
- Rename `internals` module to `_reexport` (8554)
- change partition strategy (8561)
- fix testing asserts for `NaN` values in `Struct` data (8557)
- note sortedness of results from groupby ops (8540)
- better type signature for set\_sorted (8529)
- add test for categorical input that is not fast\_unique (8527)
- Improvements to the Python release workflow (8121)
- Update docs requirements (8200)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, cgevans, ritchie46, stinodego and uchiiii


py-0.17.9
Migration guide.

Operation that require columns to be sorted will now give a warning if they are not explicitly sorted, or tagged as sorted.

python
1. inform polars that a column is sorted on the DataFrame / LazyFrame.
(
df.set_sorted("foo")
.groupby_dynamic(..)
)

2. inform polars inline via the `set_sorted` expression
df.join_asof(df2, on=pl.col("foo").set_sorted())

3. explicitly sort first
this is expensive if the data is already sorted
df.sort("foo")


โœจ Enhancements

- expose quantile/mean for duration (8491)
- require explicitly sorted flag for upsample (8488)
- allow for \_saturating suffix in duration strings (8479)

๐Ÿž Bug fixes

- don't error on cast if column is not projected (8495)
- ensure window function succeeds on empty frame (8492)
- don't set verbose on union (8487)
- check literal/group length before claiming agg staโ€ฆ (8486)

๐Ÿ› ๏ธ Other improvements

- Remove unneeded operation in `strptime` (8496)
- additional parametric testing docs/examples (8485)
- improve sorted warning/ fix tests (8484)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ritchie46 and stinodego


py-0.17.8
๐Ÿš€ Performance improvements

- less naive count (8473)
- parallelise dataframe `describe` method (8465)
- parallelize almost all flattens (8468)
- optimize horizontal min/max (8463)
- reinstate old behavior in numeric group-tuples (8445)

โœจ Enhancements

- apply thousand-separators to "shape" html output, consiโ€ฆ (8472)
- let duration string accept "1mo\_saturating" (8469)
- add dt.month\_start and dt.month\_end (8435)
- add SQL support for cumulative functions (8457)
- improve utility of dtype groups (8453)
- improved parametric `Decimal` strategy (8444)
- improved hypothesis/parametric testing profile registration (8433)

๐Ÿž Bug fixes

- fix error message of offset\_by if offsetting by negative number of months (8464)
- fix sorted warning (8462)
- improve utility of dtype groups (8453)

๐Ÿ› ๏ธ Other improvements

- bubble up time\_iter errors (8467)
- additional test coverage for dtype groups (8458)
- integrate live refresh/reload facility while writing docs (8452)
- add a series of parametric/hypothesis example tests to the main testing docs page (8454)
- parametric testing docs improvements (8447)
- improved parametric `Decimal` strategy (8444)
- improved hypothesis/parametric testing profile registration (8433)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ritchie46, universalmind303 and utkarshgupta137


py-0.17.7
๐Ÿš€ Performance improvements

- remove false sharing in perfect hash table `>2x` (8432)
- further optimised conversions to python date/datetime (8417)

โœจ Enhancements

- initial parametric/hypothesis `Decimal` dtype testing strategy (note: disabled by default) (8430)
- add `Series` support to `pl.from_repr` (8429)
- Allow `%f` in `strptime` format strings (8404)

๐Ÿž Bug fixes

- raise upon invalid use of zero\_copy\_only (8418)
- respect dtype in anonymous list builder in case ofโ€ฆ (8428)
- `str.strptime` error message: utf -> utc (8422)

๐Ÿ› ๏ธ Other improvements

- initial parametric/hypothesis `Decimal` dtype testing strategy (note: disabled by default) (8430)

Thank you to all our contributors for making this release possible!
alexander-beedie, ayemjay, jonashaag, mzjp2, pgimalac, ritchie46 and stinodego


py-0.17.6
๐Ÿš€ Performance improvements

- optimize join inner materialization of single keys (8405)
- parallelize sorted group tuple materialization (8387)
- improve materialization of huge cardinality group tuples (8382)
- improve group\_tuples materialization (8375)
- conversion speedups from polars int64 timestamps to python temporal types:
- ~35% faster โ†’ python date (8339)
- ~15% faster โ†’ python time (8352)
- ~10% faster โ†’ python datetime (8339)

โœจ Enhancements

- allow existing `item` method to optionally take row/col indices (8412)
- allow negative 'arange' expression (8413)
- warn if argument is not explicitly sorted (8409)
- .to\_numpy(use\_pyarrow=False) for Object and Boolean (8397)
- new hypothesis strategy that can generate data for `List` dtypes (8400)
- offer cleaner usage pattern for `Config` object in context-manager context (8394)
- add support for SQL "IN" expr (8396)
- add a "signed" param to `Series.is_integer` (8383)
- add is\_integer (8373)
- raise error on invalid dict aggregation (8371)
- cli output mode \& sql read\_json (8336)
- more informative keyerror on invalid getitem (8320)

๐Ÿž Bug fixes

- infer supertype in json serde (8411)
- duration on empty df (8403)
- don't inadvertently set `Series` initialised with nested tuple data as `Object` dtype (8401)
- use physical in streaming unique global table (8390)
- recursively bubble up all dtypes in list cast (8386)
- is\_in struct logical types (8378)
- fix nested null parquet read (8372)
- fix logical type in ListChunked::new\_from\_index (8367)
- fix unintentional loading of hypothesis profile (8362)
- bubble up logical type in recursive list cast (8356)
- ensure that `iter_rows` doesn't return nested `Timestamp` values (8359)
- implement clone\_inner for all series (8357)
- add missing `__hash__` support to `Field`, include "time\_zone" in `Datetime` hash, fix `Struct` hash (8354)
- fix fill\_null for categorical (8353)
- time.cast(str) as strftime (8351)
- fix logical dtypes in parallel list collection (8349)
- improve logical types of explode operation (8348)
- logical type in anonymous list builders (8346)
- address potential error caused by float division on time\_unit scaling (8337)
- escape csv header names if they contain special chars (8331)
- nested struct/list/categorical logical/physical (8334)
- fix struct schema argument (8327)
- fix precision issue when converting pl.Datetime("ms") to Python datetime (8332)
- fix deserialize empty list (8326)
- List\<Null> consistency (8325)
- fix coalesce schema (8324)
- don't do null propagation (8322)
- validate `window_size` user input in rolling\_expr (8318)
- ensure invalid list eval raises (8317)
- fix typing overloads of `read_excel` (8300)

๐Ÿ› ๏ธ Other improvements

- new hypothesis strategy that can generate data for `List` dtypes (8400)
- update `duration` docstring/example (8392)
- Upgrade ruff (8380)
- enhanced parametric testing for temporal dtypes (8347)
- Minor update to `strptime` (8345)
- adjust pytest config so as not to inadvertently prevent test debugging in IPython consoles (8308)
- add newline in pl.DataFrame.pivot docs (8307)

Thank you to all our contributors for making this release possible!
JoonHong-Kim, MarcoGorelli, StefanBRas, alexander-beedie, avimallu, grantmcdermott, jonashaag, rben01, ritchie46, stinodego and universalmind303


py-0.17.5
๐Ÿš€ Performance improvements

- use online variance kernel for aggregation (8306)

Thank you to all our contributors for making this release possible!
ritchie46

py-0.17.4
๐Ÿš€ Performance improvements

- add specialized boolean aggregation for min/max (8294)

โœจ Enhancements

- preserve time zone in combine (8263)

๐Ÿž Bug fixes

- pass name to struct construction in aggregation (8299)
- improve nested list construction (8278)
- Truncate long column name in glimpse (8281)
- Fix DataFrame.sum returning empty column names (8283)
- always sort in `top_k` fast path (8275)
- don't use fast paths for sorted join if there are โ€ฆ (8272)

๐Ÿ› ๏ธ Other improvements

- use `concat_owned_array_unchecked` when possible (8274)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ritchie46, stinodego, zaynetro and zundertj


py-0.17.3
๐Ÿ† Highlights

- support `DataFrame` init from `pydantic` model data (8178)

๐Ÿš€ Performance improvements

- fail fast on non-inferable strings in strptime if no `fmt` is provided (8111)
- make chunks search more resilient (8229)
- SIMD accelerated `arg_min`/`arg_max` (via `argminmax`) (8074)
- speed up csv parsing for slower datetimes formats (8213)
- improve datetime interpret perf (8209)
- `arr.eval` run on groupby expression engine when possible (8199)
- ~2-3x speedup for `DataFrame` init from `pydantic` models (8181)

โœจ Enhancements

- add `use_earliest` argument to `replace_time_zone` for dealing with ambiguous datetimes (8087)
- fail loudly on .%f directive, as it differs from the Python standard library (8237)
- SQL CTE's (8208)
- automatically convert `series OP expr` -> `pl.lit(series) OP expr` where OP is arithmetic (8225)
- add pickle support for `LazyFrame` (8220)
- add duration cumsum and remainder (8219)
- support `DataFrame` init from nested `dataclass`, `pydantic`, and `NamedTuple` objects (8185)
- better algorithm for streaming unique (8003)
- Add approx distinct count via `approx_unique()` (7937)
- add percentiles to `describe` methods (8169)
- support `DataFrame` init from `pydantic` model data (8178)
- display skipped row if same number of rows (8170)

๐Ÿž Bug fixes

- add special numpy float branch in anyvalue conversion (8259)
- fix boolean par materialization (8257)
- improve null/empty list construction (8255)
- fix offsets in parallel utf8 materialization (8254)
- nested struct logical type consistency (8249)
- keep literal state if elementwise function is applied (8195)
- decimal ensure backed arrow arrays have correct dtype (8193)

๐Ÿ› ๏ธ Other improvements

- parametric/hypothesis testing code cleanups (8253)
- Rename `strptime`/`strftime` args (8221)
- change sampling ratio for groupby strategy (8223)
- Rename `Expr.list` to `implode` (8165)
- don't panic on err in offset\_by (8210)
- re-enable test parallization for Windows tests (8214)
- Fix small typo: "im memory" -> "in memory" (8187)
- remove unused dtype\_to\_arrow\_type (8177)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, avimallu, borchero, chitralverma, clickingbuttons, ghuls, josh, jvdd, rben01, ritchie46, stinodego and universalmind303


py-0.17.2
โœจ Enhancements

- make unique expr serde and cmp (8153)
- Enhanced parametric testing `DataFrame` generation (8149)
- support negative index in `pct_change` (8137)
- add `log1p` to list of mathematical functions (8102)

๐Ÿž Bug fixes

- object conversion in anyvalue (8155)
- Address a ~15% regression in `import polars` speed (8151)
- validate `map` lenghts (8147)
- fix row-wise init of `UInt64` values that exceed `Int64` upper bound (8146)
- implement list\<null> constructor (8143)
- add all primitives to av\_buffer builder (8140)
- struct `is_in` (8139)
- fix wrong display name of binary expressions (8131)

๐Ÿ› ๏ธ Other improvements

- Enhanced parametric testing `DataFrame` generation (8149)

Thank you to all our contributors for making this release possible!
alexander-beedie, borchero, dependabot, dependabot[bot], jonashaag, ritchie46 and stinodego


py-0.17.1
โœจ Enhancements

- Add median stat to Series.describe (8118)
- Support `n` expression passed to Expr.head/tail (8098)
- expand list of tz-aware formats which can be auto-inferred (8085)
- clearer error message if strptime without a fmt specified fails (8086)
- infer tz-aware formats with try\_parse\_dates in read\_csv (8084)
- feat(python, rust)! make 'mo' interval raise if the target date does not exist (8078)
- auto-infer fmt for tz-aware date strings (7405)
- multiple sql contexts \& optional sql highlighting in cli (8072)

๐Ÿž Bug fixes

- fix detection of default integer indexes on win32 when loading from pandas frames (8110)
- fix stacklevel of some deprecation warnings (8089)
- lazy: fix boolean sum schema (8108)
- Expr.str.decode returns binary dtype (8099)
- Fix `show_versions` util (8096)
- don't exponentially grow error messages (partial fix). (8081)
- Fix regression with `scan_parquet/ipc` and `fsspec` (8071)

๐Ÿ› ๏ธ Other improvements

- Improve some tests (8043)
- Do not parallelize Windows tests (8097)
- Fail doctest on deprecation warnings (8091)
- Fix Expr.apply docstring for return\_dtype parameter (8069)

Thank you to all our contributors for making this release possible!
MarcoGorelli, StefanBRas, alexander-beedie, josh, n8henrie, rben01, ritchie46, stinodego, universalmind303 and zundertj


py-0.17.0
โš ๏ธ Breaking changes

- rename some function arguments (8017)
- don't create duplicate pivot names (8002)
- Remove deprecated behaviour (7978)
- rename `toggle_string_cache` to `enable_string_cache` (7970)
- change top\_k(descending) -> bottom\_k (7969)
- in `sort`, `top_k`, `sort_by`, and `arg_sort_by`, raise if `descending` is a sequence and its length doesn't match the number of columns to sort by (7957)
- Use RowsError instead of RowsException as recommended โ€ฆ (6009)
- Use `time_unit`/`time_zone` instead of `tu`/`tz` (7910)
- More ergonomic args for `struct`, `concat_str`, and `arg_sort_by` (7308)
- swap arguments of `shift_and_fill` and add defaultโ€ฆ (7192)
- set maintain\_order=False for df/lf.unique (7468)
- Rename pipe arg `func` to `function` (7139)
- Set some args for `Series`/`Expr` methods to keyword-only (7860)

๐Ÿš€ Performance improvements

- `FromParalleIter<Option<str>> for Utf8Chunked` `~1.9x` (8058)
- speed up from\_par\_iter Option\<bool> `~2.5x` (8057)
- parallelize numeric ChunkedArray materialization `~2x`. (8053)
- parallelize `into_groups` materialization ~`-25%` (8036)
- use a trusted anyvalue builder (8001)
- numeric grouptuples with nulls hash in single pass `~25%` (7980)
- ensure primitives are parsed first in anyvalue conversion (7955)
- use perfect hash table for categoricals (7951)

โœจ Enhancements

- multiple sql contexts \& optional sql highlighting in cli (8072)
- implement arg\_sort for struct dtype (8051)
- Support `DataFrame` init from pyarrow `RecordBatch` objects, and improve init from `Array` (8011)
- allow `write_ipc` to take `file=None` (returning `BytesIO`) (7997)
- Add \_\_array\_\_ method to DataFrame (7979)
- support struct in df.unique (7976)
- change top\_k(descending) -> bottom\_k (7969)
- basic sanity-checks for some `Config` methods, reference POLARS\_MAX\_THREADS in `threadpool_size` docstring (7965)
- optimize away nested unions in lp (7861)
- Use RowsError instead of RowsException as recommended โ€ฆ (6009)
- More ergonomic args for `struct`, `concat_str`, and `arg_sort_by` (7308)

๐Ÿž Bug fixes

- check element count in multi-column explode (8050)
- set lower limit for chunk\_size (8048)
- impl to\_static for struct (8037)
- create Series with list of only None with Float32 dtype (8015)
- version gate pyarrow version for `to\_pandas=(use\_pyarrowโ€ฆ (8026)
- Only allow correct type for get\_column and to\_series argโ€ฆ (7983)
- Output correct dtype for values of remapping dict in mapโ€ฆ (8013)
- all/any empty sets (8012)
- struct null\_count, cast string, tranpose and describe (8009)
- fix pivot and transpose of struct data (8005)
- don't create duplicate pivot names (8002)
- Fix test\_literal\_group\_agg\_chunked\_7968 test (7991)
- fix chunked literals in expression engine (7973)
- in `sort`, `top_k`, `sort_by`, and `arg_sort_by`, raise if `descending` is a sequence and its length doesn't match the number of columns to sort by (7957)
- pandas 2.0 compat (7962)
- concat object types (7958)
- fix decimal conversion alignment (7954)

๐Ÿ› ๏ธ Other improvements

- Fix Expr.apply docstring for return\_dtype parameter (8069)
- rename some function arguments (8017)
- Remove deprecated behaviour (7978)
- Add docstring examples for top\_k and bottom\_k (7987)
- rename `toggle_string_cache` to `enable_string_cache` (7970)
- add remaining operator-equivalent method docstrings and a related html/docs entry (7953)
- Use `time_unit`/`time_zone` instead of `tu`/`tz` (7910)
- swap arguments of `shift_and_fill` and add defaultโ€ฆ (7192)
- set maintain\_order=False for df/lf.unique (7468)
- Rename pipe arg `func` to `function` (7139)
- Set some args for `Series`/`Expr` methods to keyword-only (7860)

Thank you to all our contributors for making this release possible!
MarcoGorelli, StefanBRas, alexander-beedie, ghuls, rben01, ritchie46, stinodego and universalmind303


py-0.16.18
๐Ÿš€ Performance improvements

- improve group\_tuples of high cardinality data `~10%` (7938)

โœจ Enhancements

- Add seed argument to rank for random (7913)
- Support Numpy ufunc with more than one expression (7924)

๐Ÿž Bug fixes

- Fix lazy encode schema (7912)
- respect skip\_nulls in apply for temporal types (7908)

๐Ÿ› ๏ธ Other improvements

- Rename argument `f` to `function` in reduce docstring (7925)
- improve docstrings for numeric/math operator-equivalent methods (7942)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, alonme, ankane, dependabot, dependabot[bot], lorentzenchr, rben01, ritchie46 and zundertj


py-0.16.17
๐Ÿš€ Performance improvements

- use streaming instead of partitioned groupby (7907)
- don't auto-stream groupby (7906)
- rechunk before aggs (7903)
- don't re-allocate groups in sorted to\_dummies (7897)
- fix hashing regression (7833)
- rechunk dataframe before unique computation (7814)
- improve hash quality (7813)
- always take sorted fast path group\_tuples (7787)

โœจ Enhancements

- auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz\_aware argument (7886)
- Add `Series.pow()` (7898)
- deal with null values in cut/qcut (7878)
- allow list/tuple `lit` values (7879)
- Support writing dynamic/live formula columns via `write_excel` (7871)
- support datetime/date subclasses (e.g. FreezeGun) (7819)
- support mode for floats and categoricals (7827)
- allow Series init with `Unknown` dtype to proceed as if dtype is `None`, to allow inference (7830)
- support sort by 'struct' type (7822)
- add `to_repr` methods to DataFrame and Series (7802)
- thousand separators in shape of repr `DataFrame` (7775)
- Improve automatic output dtype setting for `map_dict`. (7797)
- new utility `from_repr` function that reconstructs a DataFrame from its table repr (7781)
- deprecate default value of `aggregation_function` being `'first'` in `pivot`. In a future version, it will default to `None` (7784)

๐Ÿž Bug fixes

- fix lit agg (7904)
- disable ooc groupby (7901)
- Use `check_exact` for temporal types in `assert_series_equal` (7896)
- fix abs logical type (7895)
- fix boolean min/max output type and null handling (7894)
- Cast compound types to their simple string representation on export to Excel (7887)
- ensure `_repr_html_` escapes column names in addition to data/body elements (7877)
- validate groupby\_dynamic inputs (7876)
- correct for chunks in arg\_where (7873)
- fix nested logical/physical list (7872)
- fix arbitrary nested logical types (7869)
- Relax type hints for when/then (7857)
- don't use fxhash in sink\_sorted fast path (7849)
- parquet stats \& all kernel (7846)
- Add missing type hint for `is_between` (7835)
- fill null list (7836)
- fix explode list[null] (7832)
- fix unicode lower/uppercase (7826)
- raise error on invalid series concat strategy (7823)
- don't use naive name in partitioned agg (7810)
- Ensure CsvReader always respects the n\_rows parameter (7789)

๐Ÿ› ๏ธ Other improvements

- Fix read\_csv docstring formatting (7875)
- update concat docstring for how parameter (7834)
- don't run hash stability test on arm64 (7825)
- Improve pl.when documentation (7793)
- add description of ddof (7811)
- Rename `venv` folder to `.venv` (7790)
- add a `make requirements` option to install/refresh dependencies without having to recreate the `venv` (7792)
- fixup stacklevels (7796)
- Drop `ruff` target version (7791)

Thank you to all our contributors for making this release possible!
LdRoW, MarcoGorelli, Newtoniano, advoet, alexander-beedie, duskmoon314, foxcroftjn, ghuls, jonashaag, ritchie46, stinodego and zundertj


py-0.16.16
๐Ÿž Bug fixes

- ensure k is lower than height (7779)
- raise error on invalid categorical cast (7686)
- raise error on attempt to set invalid `Datetime` or `Duration` dtype timeunit (7768)

๐Ÿ› ๏ธ Other improvements

- Add "typos" as spell checking lint (7759)
- fix typos (7756)

Thank you to all our contributors for making this release possible!
alexander-beedie, ghuls, ritchie46 and universalmind303


py-0.16.15
๐Ÿš€ Performance improvements

- change top\_k algorithm (7718)
- runtime SIMD target detection for `min/max/sum` and impl SIMD `mean` `~2-5x` (7702)
- implement top-k optimization (7678)
- ooc-sort dump in thread local if IO-thread is full. (7668)
- use perfect hash table for ooc partitioning (7653)

โœจ Enhancements

- add dt.datetime, dt.date, dt.time (7735)
- new "row\_totals" parameter for `write_excel` that adds a row-wise total column using structured references (7751)
- More ergonomic args for `min/max` (7742)
- More ergonomic args for `concat_list` (7745)
- add `Series.hist` (7727)
- add `qcut` (7724)
- add `maintain_order` option to `Series.cut` (7723)
- create series with only none list with specific dtype (7722)
- add `maintain_order` in `arr.unique` (7721)
- `DataFrame.top_k/ LazyFrame.top_k` (7720)
- clearer error message when replace\_time\_zone encounters ambiguous or non-existent datetimes (7685)
- include `set_fmt_float` value in `Config` load/save state (7696)
- raise on descending date\_range arguments (7671)
- include `add` operator-equivalent expression (7667)
- add expression method equivalents for existing math/logical operators (7660)
- add `is_leap_year` to temporal expressions (7618)
- full out-of core support for streaming groupby (7630)
- clearer error message when creating duration string without integer (7648)
- allow `scan_csv` to take a list of column names in a `new_columns` param (7642)
- out-of-core `groupby/unique` of groupby on integer keys (7604)
- allow set and/or frozenset as input to `is_in` expressions (7613)

๐Ÿž Bug fixes

- make zip\_with\_same\_type obligatory (7761)
- fix melt projection pushdown node (7752)
- fix predicate pushdown for 'unique' first/last (7749)
- fix null propagation (7748)
- fix init from pandas Series that has no dtype and is empty (or contains only null values) (7716)
- avoid ambiguous time error when passing python Datetime to DataFrame constructor (7711)
- Fix infering CSV schema when skip\_rows\_after\_headeโ€ฆ (7701)
- fix race condition in null handling of window fastโ€ฆ (7695)
- address `Series` init regression from list of `np.arange` objects (7692)
- improve error message if unavailable lazy module is queried for `__version__` attribute (7680)
- fix reversed non-existant file error msg (7657) (7673)
- respect time zone in groupby\_rolling with negative offset (7664)
- fix empty case str.replace (7662)
- allow for list of datetimes with timezone(timedelta!=0) in Series constructor (7645)
- respect time zone in rolling\_\* functions (7643)
- fix schema of decimal type reads (7652)
- detect deltalake version in show\_versions (7622)
- respect time zone in offset\_by (7626)
- fix boolean `Series` init with integer 1/0 values (7619)
- respect time zone in dt.round (7611)

๐Ÿ› ๏ธ Other improvements

- Display full argument names in \_\_repr\_\_ for Datetime aโ€ฆ (7736)
- add `Expr.pipe` API docs link (7734)
- Add sort\_by example taking one row per group (7712)
- Clean up a few type hints/imports (7687)
- Move `wrap_x` utils to `utils` module (7672)
- Reduce number of polars.internals imports (7628)
- Remove duplicate column from Expr.sort example (7684)
- Move `expr` parsing to utils (7661)
- Eliminate function re-exports through `internals` (7650)
- Move last functionality out of `internals` (7649)
- More internals cleanup (7638)
- Update lockfile (7637)
- fix and improve type hints and function names (7609)
- remove additional logic from scan delta (7605)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, borchero, chitralverma, didriksg, ghuls, jakob-keller, minimav, ritchie46, stinodego, universalmind303 and zundertj


py-0.16.14
๐Ÿš€ Performance improvements

- optimize string kernels, (elide redundant allocs) (7602)
- even faster polars module import (~15%) (7584)
- optimize `str_replace` for same length replacements `~2x` (7580)
- reinstate fast module import and optimise `DataFrame` init by implementing dynamic `singledispatch` registration (7559)
- improve perf or `str.replace_n` and add `n` argument `~10x` (7575)
- speedup `replace_literal_all` of single byte replacements `~15x`. (7565)
- set sorted flags (7558)
- extend ultrafast constant-value frame init to temporal types (over 1,000x speedup) (7527)

โœจ Enhancements

- slightly more space-efficient table output (use ellipsis char, not three periods) (7599)
- implement decimal -> dtype cast (7600)
- use head on pyarrow datasets (7570)
- overwrite streaming chunk size (7543)

๐Ÿž Bug fixes

- remove index columns in pandas to\_sql() (7596)
- add decimal chunk\_lengths (7589)
- fix ooc sort. the fast path was invalid (7588)
- Fix regression throwing AmbiguousTimeError in groupby\_dynamic (7454)
- activate dtype-duration for polars-ops (7582)
- distinct project whole schema if not a subset (7581)
- reinstate fast module import and optimise `DataFrame` init by implementing dynamic `singledispatch` registration (7559)
- sql window functions (7458)
- respect time zone in upsample (7563)
- fix rolling windows for windows that shrink from lhs (7556)
- remove pyarrow from construction and dispatch to rust (7551)
- fix negative indexing for `head`/`tail` (7554)
- Remove `BatchedCsvReader` from public API (7546)
- fix logical/list getitem (7545)
- pushdown key in merge sorted projection pd (7542)
- don't upcast column to string in 'is\_in' operation (7538)

๐Ÿ› ๏ธ Other improvements

- Move more code out of `internals` (7597)
- add a performance hint about use of `lru_cache` to the `apply` docstrings (7593)
- Avoid `pli` in type hints (part 2) (7587)
- Avoid `pli` in type hints (part 1) (7586)
- Move core objects to top level (7576)
- Bump ruff (7567)
- Rename namespace Array -> List in docs (7541)
- Move `fmt` tests to `test_fmt` (7555)
- Rename `sep` arg to `separator` (7533)
- Minor Series cleanup (7531)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Vincenthays, alexander-beedie, ritchie46, stinodego, universalmind303 and vincev


py-0.16.13
๐Ÿš€ Performance improvements

- use atoi in favor of lexical in strptime `-25%` (7501)
- [csv] faster utf8 validation `~20%` (7500)
- [csv] SIMD accelerate SplitFields `-40%` (7498)
- (csv) don't use memchr for splitfields `-~0.15%` (7494)
- csv-file use fast-float for csv float parsing (7492)

โœจ Enhancements

- literal support for binary (7519)

๐Ÿž Bug fixes

- fix(rust, python) respect time zone in date\_range (7503)
- transparently integrate externally-registered Excel formats (7520)
- use physical types in sort-by args (7518)
- keep series name in arithmetic (7513)
- Initialize with `Decimal` dtype (7511)
- fix projection pushdown of asof\_joins (7487)

๐Ÿ› ๏ธ Other improvements

- update `show_versions` with xlsxwriter (and add as optional dependency) (7507)
- Use new `LazyFrame` init in docs (7508)
- Bump some linting versions (7505)

Thank you to all our contributors for making this release possible!
CloseChoice, MarcoGorelli, alexander-beedie, ecashin, ritchie46 and stinodego


py-0.16.12
๐Ÿš€ Performance improvements

- speed up comparison of sorted arrays `~3.85x`. (7478)
- improve performance for datetime parsing with %Z (7369)

โœจ Enhancements

- slice pushdown in `LazyFrame.unique` (7470)
- streaming `LazyFrame.unique` (7466)
- automatically infer iso8601-like dates (7457)
- push down temporal predicates to pyarrow scanner (7421)
- slice pushdown in scan\_arrow\_ds (7449)
- convert decimal 256 to 128 on entry (7448)
- provide option to set individual `row_heights` on Excel export (7447)
- optimise `Excel` export when all data in a multi-column conditional format is contiguous (7427)
- dynamically change chunk\_size in streaming `exploโ€ฆ (7415)
- support setting multiple conditional formats on the same `Excel` table column/range (7411)
- add unary +,-,! to sql (7399)
- disallow converting key values to null in map\_dict due โ€ฆ (7393)
- use IO backed reader when `low_memory=True`. (7394)
- The big error revamp (7362)
- parse year-month-day as Datetime in slow-path (7373)
- support applying one conditional format to multiple columns on `Excel` export (allows for heatmaps) (7379)
- Proper superclass for Decimal (7384)
- tweak default Date and Time format strings for `Excel` export (7380)
- make melt streamable (7364)
- don't rechunk before writing to csv (7365)

๐Ÿž Bug fixes

- handle an unusual edge-case introspecting dataclass type hints (7476)
- raise error on categorical by arguments if not froโ€ฆ (7464)
- fix and test df.corr (7463)
- make `DataFrame` rendering compatible with quarto and pandoc (7455)
- sql floor \& ceil (7456)
- fix `DataFrame` table rendering issue in some Jupyter environments (7450)
- allow for hourly date\_range to cross DST (7430)
- respect lexical/physical in multi-column categoricโ€ฆ (7417)
- fix null\_dtype slice (7414)
- sort\_by logical types (7412)
- parse single-digit months and dates when code would have gone down fastpath (7391)
- creating empty struct series with some unit fields (7383)
- minor `Excel` export improvements/fixes (7363)

๐Ÿ› ๏ธ Other improvements

- Rename `read_x` functions arg `file` to `source` (7460)
- Refactor `utils` module (7435)
- Rename functions that clash with builtins (7424)
- Showcase new ergonomic syntax in README (7419)
- Rename Decimal `prec` to `precision` (7401)
- Remove `_base_type` util (7410)
- Rename first arg of `from_x` to `data` (7407)
- use exc as variable name for all captured exceptions (7403)
- Remove redundant `schema` keyword description from `pl.โ€ฆ (7400)
- Rename `cfg` module to `config` (7385)
- Add test for for groupby referencing the same column twice (7340)
- Split up `datatypes` module (7357)
- Clean up type checking lints (7358)

Thank you to all our contributors for making this release possible!
Hofer-Julian, MarcoGorelli, SauravMaheshkar, aldanor, alexander-beedie, cjackal, ghuls, josh, juba, nrebena, rben01, ritchie46, stinodego and universalmind303


py-0.16.11
๐Ÿš€ Performance improvements

- optimize str.replace\_all (7353)
- optimize str.replace `~2x` improvement (7347)
- ensure utf8 apply preallocates memory (7345)

โœจ Enhancements

- make `LazyFrame.explode` streamable. (7341)
- allow import of dtype groups from the top-level to improve discovery (7339)

๐Ÿž Bug fixes

- make decimal types opt-in (7348)
- fix chunk\_sizes in threading apply (7351)
- don't panic when writing `NullArray` values to python row tuple (7346)

๐Ÿ› ๏ธ Other improvements

- add `write_excel` API docs link (7338)

Thank you to all our contributors for making this release possible!
alexander-beedie, ritchie46 and s-banach


py-0.16.10
๐Ÿ† Highlights

- Excel export support via new `write_excel` IO method (7251)
- out of core sort on multiple columns (7244)

๐Ÿš€ Performance improvements

- improve batched csv readers perf and memory perf (7329)
- use inlined strings for field and schema (7272)
- reuse groups in binary expressions (7202)

โœจ Enhancements

- support creation of sparklines when exporting `Excel` tables (7333)
- support sqlalchemy/pandas backed `write_database` (7322)
- add adbc database reader and writer (`DataFrame.write_database`) (7318)
- make `expr.apply` streamable in selection context (7316)
- More ergonomic `unnest` args (7310)
- initial working version of Decimal Series (7220)
- Support explicit Binary dtype in constructor (7305)
- implement serde for literal datetime and series (7301)
- improve error message if mmap fails in ipc (7300)
- add multi-threaded apply (7277)
- add support for serializing categoricals to json (7276)
- Add Expr.arg\_true (7056)
- don't require pyarrow for initialising Series with Python datetimes (7273)
- Excel export support via new `write_excel` IO method (7251)
- deprecate `describe_(optimized)_plan` in favor of `explain` (7264)
- enable min-max skipping for binary in parquet, enable min-max skipping for `is_in` exprs (7169)
- out of core sort on multiple columns (7244)
- support nulls\_last for multi-column sort (7242)
- allow optimizations flags in describe\_plan (7233)
- implement row encoding for boolean and binary (7218)
- allow passing utc=True when parsing time-zone-naive date strings (7203)
- Add `**named_exprs` input for `struct` (7208)
- add sql "ARRAY\_AGG" (7204)

๐Ÿž Bug fixes

- fix offset in threading apply (7330)
- fix projection pushdown on join with unused join key (7326)
- raise error on time -> datetime cast (7325)
- raise error if output of 'apply' cannot be determined (7317)
- make `pl.struct` mappable (7299)
- err on duplicate with\_column names (7296)
- don't panic on `str.parse_int` (7072)
- improve concat\_list with empty list error message (7236)
- fix groupby\_dynamic's binning when index\_column is time-zone-aware (7278)
- fix preservation of microseconds when converting Python datetime (7271)
- fix us precision of datetime to anyvalue conversion (7268)
- no panic on empty cross join (7266)
- raise error on ambiguous filter predicates (7265)
- handle concat\_list with first lit value (7235)
- respect schema in DataFrame initialisation for time-zone-aware datetime (7240)
- ensure `every` type is properly normalised (for `groupby_dynamic` and `groupby_rolling`) (7238)
- add test of median function in lazy mode (7224)
- dont lose precision in pl.date\_range due to floating point arithmetic (7229)
- Conversion of negative timedelta to polars duration (7209)
- ensure parametric testing `cols=int` definition respects `allowed_dtypes` (7213)

๐Ÿ› ๏ธ Other improvements

- Fix `read/write_database` tests (7327)
- Rename `scan_ds` to `scan_pyarrow_dataset` (7320)
- don't run tests that write to disk by default (7321)
- rename `read_sql` to `read_database` (7315)
- Address `git2` vulnerability (7309)
- Correctly deprecate `DataFrame.pearson_corr` (7307)
- Skip `write_excel` doctests (7306)
- Run `pytest-xdist` with worksteal (7304)
- Rename pearson\_corr \& spearman\_rank\_corr (7014)
- refactor(python) Split `io` module per type (7295)
- Move `_html` module to dataframe module (7256)
- Enable `strict` for ruff `TCH` lints (7234)
- better select on map\_dict dtype (7217)
- add warning of mmap to ipc docstring (7216)
- exit non-zero on fix from ruff (7215)
- ensure that `DataFrame` and `LazyFrame` init params don't diverge (7214)

Thank you to all our contributors for making this release possible!
MarcoGorelli, aldanor, alexander-beedie, coinflip112, csko, dependabot, dependabot[bot], ghuls, josemasar, josh, mslapek, nrebena, ozgrakkurt, papparapa, ptiza, rben01, ritchie46, sorhawell, stinodego, universalmind303, xyning and zundertj


py-0.16.9
๐Ÿš€ Performance improvements

- improve perf of multi-args exprs in groupby context (7186)
- optimize `sequence_to_pydf` (7044)
- improve single argument elementwise expression peโ€ฆ (7180)

โœจ Enhancements

- show column name if read\_csv errors (7177)
- support direct `LazyFrame` init (same params as `DataFrame`) (7122)
- add a `base_type` method to `DataType` (7166)
- add explode for binary (7159)
- add binary apply (7160)
- Allow pl.Int32 Series as output in eager repeat. (7152)
- improve error message when read\_csv fails (7150)
- Improve usability of Null type. (7136)
- add glob support to scan\_ndjson (7143)
- add Expr.pipe (7134)
- streaming: scale chunk\_size on table width (7119)
- additional read functions (7102)
- More ergonomic `explode` args (7115)

๐Ÿž Bug fixes

- fix(rust, python); make list function 'map' and refactor multi-arg exโ€ฆ (7185)
- Fix Series.argsort (7183)
- validate trees before inserting streaming node (7179)
- Raise ValueError for getitem when column indexes are outโ€ฆ (7167)
- fix list take logical types (7163)
- fix null cmp fast paths (7157)
- fix df division dispatch (7155)
- don't panic un unsupported arithmetic type (7154)
- don't let a cast unset agg\_state and keep logical โ€ฆ (7151)
- expose sort expressions to stack-optimizer (7148)
- improve error message when read\_csv fails (7150)
- make cast unknown a no-op (7147)
- fix panic on cum\_prod (7141)
- respect f32 schema in deep expressions (7146)
- fix return type of `_unpack_schema` to prevent potential TypeError (7128)
- fix docstring in set\_tbl\_cols() (7121)
- fix deadlock in scan\_csv()->sink\_parquet() (7118)
- subtracting Series from date has wrong sign (7107)
- fix scan\_ipc receiving storage\_options (7085)
- nested sql exprs (7112)

๐Ÿ› ๏ธ Other improvements

- ensure binary branches are executed in parallโ€ฆ (7193)
- Deprecate pl.get\_dummies (7055)
- Add Series.cut, deprecate pl.cut (7058)
- examples functional programing (7135)
- fix docstring in set\_tbl\_cols() (7121)
- Build versioned API reference (7114)

Thank you to all our contributors for making this release possible!
MarcoGorelli, Trippy3, alexander-beedie, foxcroftjn, ghuls, iamsmkr, jakob-keller, josh, mslapek, papparapa, ritchie46, romanovacca, stinodego, universalmind303 and zundertj


py-0.16.8
๐Ÿš€ Performance improvements

- optimize arr.sum for list array with inner nulls (7053)
- optimize arr.min/arr.max (7050)
- optimize arr.mean (7048)
- optimize arr.sum (7047)
- optimize 'arg\_where' (7039)
- More efficient handling of \*args/\*\*kwargs (7026)

โœจ Enhancements

- allow for simple creation of n-row empty frame/series via `clear` (7095)
- Make polars not copy data when importing from arrow (7084)
- More ergonomic `drop` args (7063)
- More ergonomic `partition_by` args (7065)
- More ergonomic `exclude` args (7082)
- allow inline expressions in asof\_join (7088)
- add 'use\_statistics' option to parquet readers (7087)

๐Ÿž Bug fixes

- allow map\_dict on categorical dtype (7097)
- fix logical types in arr.get (7094)
- allow fill\_null in eager if type now known (7092)
- do projection just before concat to ensure same sizes (7089)
- fix 'filter' in groupby context when expression isโ€ฆ (7041)
- fix type hint of 'when->then->otherwise' (7040)
- accept more types in `from_records` (7033)

๐Ÿ› ๏ธ Other improvements

- Rename pivot `aggregate_fn` to `aggregate_function` (7059)
- Add `TYPE_CHECKING` lints (7070)
- Deprecate more non-keyword arguments (7030)
- Rename kwarg reverse to descending (6914)
- Rename args `f`/`func` to `function` (7032)
- let read\_csv take Sequence as columns, remove several `type: ignore` (7028)
- add example for `arr.count_match()` (7029)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, coinflip112, datapythonista, jakob-keller, moritzwilksch, ritchie46, stinodego, universalmind303 and zundertj


py-0.16.7
๐Ÿš€ Performance improvements

- add `arr.count_match` expression and optimize `arr.sum` for `List<Boolean>` (7023)
- optimize `selection_to_pyexpr_list` (7020)
- avoid unnecessary function calls in `LazyFrame.with_columns()` (7019)
- remove O^2 behavior in melt (7003)
- Improve performance of `expr_to_lit_or_expr` for arguments of type `Expr` by ~80% (6967)
- improve vec\_hash perf for boolean and utf8 (6963)
- don't pack utf8 columns in grouptuples `~5-15%` (6959)
- don't pack integer keys in determining `~8-18%` group tuples. (6956)
- use fxhash for all integers (6954)

โœจ Enhancements

- add `arr.count_match` expression and optimize `arr.sum` for `List<Boolean>` (7023)
- add sort for struct dtype (7021)
- More ergonomic `coalesce` args (6989)
- raise informative error if invalid datetime\_format passed to write\_csv (7005)
- Improve Series \& Numpy arithmetic (6983)
- More ergonomic `agg` args (6982)
- rename parse\_dates => try\_parse\_dates (6987)
- remove `packaging` and/or `distutils` dependency with a minimal version parser utility (6972)
- More ergonomic `over` args (6986)
- add `upper_bound` and `lower_bound` methods to `Series` (6990)
- More ergonomic `col` args (6996)
- More ergonomic `sort` args (6896)
- Make groupby agg shortcuts available in lazy (6944)
- add `map_dict` method for Series (6946)

๐Ÿž Bug fixes

- reflect time zone conversion in lazy dataframe schema (7022)
- ensure set\_sorted never panics (7013)
- fix struct append 0 sliced (7012)
- fix dtype of diff for uint8 (7010)
- fix coalesce supertype (7000)
- if given, respect dtype time zone when instantiating pl.lit value (6999)
- fix fill\_null for categoricals (6998)
- dtype of pow function (6985)
- fix is\_duplicated for utf8 dtype (6997)
- Remove check for path to be non-directory if use\_pyarrow (6994)
- if given, respect dtype timeunit when instantiating `pl.lit` value (6991)
- Add packaging to runtime dependencies (6962)
- fix temporal logical types in pivot (6957)
- typo in mean unit test - changed `median` -> `mean` (6960)
- ensure literals are expanded in streaming (6952)
- str.contains strict=False took no effect (6950)

๐Ÿ› ๏ธ Other improvements

- date-time unit tests refactor (7002)
- test lit series arithmetic order (7015)
- More test restructure (6961)
- Properly deprecate `.struct.to_frame` (6958)
- Properly deprecate GroupBy.agg\_list (6943)

Thank you to all our contributors for making this release possible!
MarcoGorelli, MatveyF, alexander-beedie, jakob-keller, mslapek, ozgrakkurt, papparapa, ritchie46, sorhawell, stinodego, xhochy and zundertj


py-0.16.6
โœจ Enhancements

- add is\_duplicated/is\_unique for struct dtype (6940)
- add `is_between` method for Series (6933)
- supported nested fixedsizebinary conversion (6923)
- raise error on invalid aggregation expressions (6921)
- provide better errors when failing to read CSV data from buffers that have advanced their read position (6920)
- truncate file path on error msg (6917)
- Parse JSON data in `Utf8` to polars dtype (6885)
- More ergonomic `groupby` args (6872)

๐Ÿž Bug fixes

- object to\_dict (6931)
- respect maintain\_order in groupby.apply (6926)
- add special fast path for elementwise expression oโ€ฆ (6924)
- fix anonymous list builder (6916)
- reject multithreading on excessive ',\\n' fields (6906)
- fix regression with `date` => `object` typing in `to_pandas` method (6902)
- dispatch suffix to asof\_join by (6899)
- improve recursive casting of nested data (6897)
- don't fast explode on null introducing take (6890)
- prevent external modules found on `PYTHONPATH` from bleeding into polars `venv` (6888)
- prevent conflation of `unit.io` tests directory with python `io` module (6889)

๐Ÿ› ๏ธ Other improvements

- Bump ruff version (6936)
- add more nested construction tests (6912)
- Update Cargo.lock (6893)
- unify constructor logic when initialising from a sequence of dicts (6887)
- prevent conflation of `unit.io` tests directory with python `io` module (6889)
- refactor `datelike` as `temporal`, and support Time dtype in `Series.to_numpy` (6881)
- Consistently parse column name inputs (6879)
- Use `Self` type more consistently (6882)

Thank you to all our contributors for making this release possible!
MarcoGorelli, adamgreg, alexander-beedie, josh, jvdd, ritchie46 and stinodego


py-0.16.5
๐Ÿš€ Performance improvements

- speedup quantile/median `~2x` (6861)
- remove unneeded series allocations in groupby aggs (6855)

โœจ Enhancements

- restore dataframe class (6869)
- add `include_index` option on init from pandas frames (6847)
- properly implement null array (6817)
- avoid panic error in strftime with invalid format (6810)

๐Ÿž Bug fixes

- fix crash in write\_csv when mixed tz-naive and tz-aware datetimes are present (6828)
- accept more types in groupby.agg (6709)
- Fix pl.from\_dataframe() as pyarrow.interchange was not iโ€ฆ (6844)
- fix schema of functions: (6845)
- stabilize integer operation to minimal required dtype (6841)
- use explicit type-arg for PythonDataType (6481)
- fix numpy/datetime regression (6835)
- implement to\_list for null dtype (6834)
- Raise ValueError on passing multiple expressions Numpy ufunc (6821)
- respect schema in ndjson (6819)

๐Ÿ› ๏ธ Other improvements

- Fail tests on warning (6868)
- further improve struct expr docstrings (6852)
- Deprecate non-keyword args for some functions (6851)
- un-skip passing test (6854)
- parenthesise `col` type signature to improve hint interaction with PyCharm (6850)
- Deprecate positional `join` args (6826)
- Rename argsort/argsort\_by to arg\_sort/arg\_sort\_by (6829)
- Update dprint config excludes (6822)
- Fix some broken noqa comments (6823)
- Run mypy as part of the lint workflow (6820)
- various minor docstring rendering fixes (6818)
- fix lazy `groupby` docstring/rendering (6816)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, ghuls, josh, kngwyu, oysols, ritchie46, stinodego and zundertj


py-0.16.4
๐Ÿš€ Performance improvements

- remove PySequence downcast (6803)
- optimize `arg_min/arg_max` (6799)

โœจ Enhancements

- boolean `Series` broadcast comparison (eq/neq) against scalar True/False (6797)

๐Ÿž Bug fixes

- ensure join frame types are consistent (6798)
- enable empty DataFrame (and Series) init from List of Structs/Lists schema/dtype (6795)

Thank you to all our contributors for making this release possible!
alexander-beedie, igmriegel and ritchie46


rs-0.27.0
๐Ÿ† Highlights

- Formalize list aggregation difference between groupbys, selection and window functions (6487)

โš ๏ธ Breaking changes

- error on string \<-> date cmp (6498)
- Formalize list aggregation difference between groupbys, selection and window functions (6487)
- show where error messages originated (6482)
- `str.strip` with multiple chars (5929)

๐Ÿš€ Performance improvements

- update string replacement codepaths following new benchmarking (6777)
- improve dynamic groupby performance on sorted keys (6599)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (6472)
- Improve rechunk check (6268)
- reuse allocated scratches in ipc writer (6287)
- use dedicated writer thread for sink\_parquet (6285)
- first check rev-map on categorical equality check (6085)
- ensure set\_at\_idx is O(1) (5977)
- use iterator instead of loop `polars_io::csv::parser::skip_condition` (5157)

โœจ Enhancements

- accept separator for pivot and to\_dummies (6780)
- feat(rust, python) rename 'tz' to 'time\_zone' in convert\_time\_zone and replace\_time\_zone (6784)
- rename with\_time\_zone to convert\_time\_zone and cast\_time\_zone to replace\_time\_zone (6768)
- support timezone in csv writer (6722)
- implement series abstractions for `Int128Type` (6679)
- parse timezone from Datetime (6766)
- formally support duration division (6758)
- add argmin/max for utf8 data (6746)
- Support an ignore\_nulls param for EWM calculations. (5749) (6742)
- deprecate tz\_localize (6693)
- guarantee schema-stable `col(dtype)` selection (6674)
- better-characterise `NotFound` exceptions (6670)
- disallow with\_time\_zone from/to tz-naive (6659)
- let cast\_time\_zone work on tz-naive and deprecate tz-localize (6649)
- implement fill\_null for list data (6635)
- expression functions should be nullable (6629)
- add streamable udfs (6614)
- is\_first for struct dtype (6595)
- Added from\_str\_radix method to StringNameSpace that allows to parse strings from any base to i32 (6570)
- improve predicate pushdown (6579)
- raise error on invalid binary cmp (6564)
- let cast\_time\_zone accept None (6539)
- add utc parameter to strptime (6496)
- add meta 'has\_multiple\_outputs', 'is\_regex\_projecโ€ฆ (6500)
- error on string \<-> date cmp (6498)
- show where error messages originated (6482)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (6472)
- allow expr in str.contains (6443)
- add float formatting option (6432)
- allow expressions as arguments in `str.ends_with` (6361)
- accept expr in `str.starts_with` (6355)
- add strict parameter to decoding expressions (6342)
- allow unordered struct creating from anyvalues (6321)
- parse abbrev month name (6314)
- add `dt.combine` for combining date and time components (6121)
- add sink\_ipc (6286)
- ensure ooc sort works ooc with all-constant values (6235)
- The 1 billion row sort (6156)
- optionally treat missing UTF8 values as the empty string at CSV parse-time (6203)
- When moving error out of `LogicalPlan`, leave behind String with error message instead of `None` (6199)
- generalize the cloud storage builders (5972)
- Implement `DataFrame.unique(keep="none")` (6169)
- add `arr.take` expression (6116)
- allow `extend_constant` to work with date literals (6114)
- allow nested categorical cast (6113)
- add a `rounded_corners` modifier to `pl.Config.set_tbl_formatting` (6108)
- Get polars to compile to wasm target (6050)
- add search\_sorted for arrays and utf8 dtype (6083)
- improve error message when writing nested data toโ€ฆ (6040)
- updated default table format from "UTF8\_FULL" to "UTF8\_FULL\_CONDENSED" (5967)
- `str.strip` with multiple chars (5929)
- support glob in parquet object\_storage (5928)
- read decimal as f64 (5938)
- improve query plan scan formatting (5937)
- allow all null cast (5933)
- truncate by calendar weeks (5759)
- merge sorted dataframes (5817)
- impl hex and base64 for binary (5892)
- streaming parquet from object\_stores (5871)

๐Ÿž Bug fixes

- always rechunk if n\_chunks > n\_rows (6786)
- fix ndjson empty array parsing (6785)
- make some list expressions aware of groupby context (6776)
- use explicit drop function node (6769)
- don't set sorted flag if we reverse sort the left โ€ฆ (6772)
- handle edge-case with string-literal replacement when the replace value looks like a capture group (6765)
- respect skip\_rows in glob parsing csv (6754)
- Improve error message in DataFrame constructor (6715)
- arrow map dtype conversion (6732)
- dedicated `rename` implementation. (6688)
- return correct display/repr names for NaN-related expressions (6721)
- strftime with time zone directive (6673)
- improve error message in date\_range with invalid units (6671)
- remove uses of rayon global thread pool (6682)
- true-divide output type (6665)
- fix(rust, python) cast to and from fixed offsets (6602)
- raise error on string numeric arithmetic (6601)
- partially assert sortedness in groupby dynamic (6593)
- fix(rust, python); raise oob if negative index given to take (6590)
- fix predicate pushdown key check (6577)
- fix schema of apply with many inputs on empty df (6571)
- let lhs determine struct order in supertype (6572)
- fix(rust, python) validate utc, fmt, and tz-aware in strptime (6550)
- add strptime to filter boundary (6560)
- list eval all null array (6545)
- implement ser/de for BinaryChunked (6543)
- raise if tz\_localize called on UTC-aware (6526)
- make concat\_list group aware (6527)
- error on invalid expanding expression (6521)
- create from dicts directly as struct categorical (6520)
- fix oob in arr.get by expressions (6519)
- fix cse schema (6518)
- panic when max\_len -1 is reached (6494)
- Formalize list aggregation difference between groupbys, selection and window functions (6487)
- fix(rust, python) validate tz in with\_time\_zone (6417)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (6472)
- use consistent floor division for floats/ints (6460)
- split semi/anti join optimization (6459)
- fix doc comment in ParallelStrategy (6444)
- fix projection pushdown on double semi join (6440)
- cumulative\_eval ensure output dtype is respected (6435)
- auto-detect %+ as tz-aware (6434)
- correct error message in cast\_time\_zone (6411)
- only use float simd on specific alignment (6427)
- no early escape when window is equal to len in rolling\_float (6408)
- raise error on invalid sort\_by argument (6382)
- take offset into account with str.explode (6384)
- Return empty batch for pl.read\_csv\_batched().next\_โ€ฆ (6381)
- implement ser/de for StructChunked (6359)
- series of empty structs (6347)
- don't cast nulls before trying normal cast (6339)
- expand all nested wildcards in functions (6334)
- fix groupby rolling by\_key if groups are empty (6333)
- parse abbrev month name (6314)
- disallow alias in inline join expressions (6312)
- feature flag "get\_sink" ipc (6306)
- block proj-pd and pred-pd on swapping rename (6303)
- convert nested dictionary with i64 keys (6299)
- fix panic dynamic\_groupby on empty dataframe (6294)
- Parse negative dates with polars parser (6256)
- Add list inner dtype when printing Series (6233)
- fix when then otherwise with arity and aggregationโ€ฆ (6224)
- pass name to value counts in aggregation (6221)
- don't set fast\_explode on list of structs (6220)
- explode of empty nullable list (6190)
- fix empty streaming joins (6149)
- fix streaming joins where the join order has been โ€ฆ (6143)
- write tz-aware datetimes to csv (6135)
- Print error message on mmap IPC file only in verbose mode (6098)
- fix invalid dtype in chunked array after struct cast (6093)
- don't run cse cache\_states if no projections found (6087)
- Update `read_csv` error message (6082)
- propogate nulls in binary arithmetic/aggregation (6076)
- deal with unnest schema expansion in projection pd (6063)
- correct output dtype for cummin/cumsum/cummax (6062)
- block streaming on literal series/range (6058)
- ndjson struct inference (6049)
- deal with empty structs (6039)
- fix aggregation that filters out all data (6036)
- fix diff overflow (6033)
- keep column names in is\_null/is\_not\_null (6032)
- keep name when sorting categorical in lexial order (6029)
- properly set null anyvalue if categorical is nesteโ€ฆ (6025)
- make weekday tz-aware (5989)
- fix categorical in struct anyvalue issue (5987)
- fix invalid boolean simplification (5976)
- allow empty sort on any dtype (5975)
- properly deal with categoricals in streaming queries (5974)
- don't panic on ignored context (5958)
- don't allow named expression in arr.eval (5957)
- fix panic in join expressions (5954)
- block ordered predicates before explode (5951)
- adhere to schema in arr.eval of empty list (5947)
- fix arrow nested null conversion (5946)
- allow None in arr.slice length (5934)
- fix time to duration cast (5932)
- error on addition with datetime/time (5931)
- don't create categoricals in streaming (5926)
- object filter should keep single chunk (5913)
- csv, read escaped "" as missing (5912)
- fix pivot of signed integers (5909)
- fix latest oob in streaming convertion (5902)
- fix `date + duration` offsets outside of nanosecond datetime bounds (5889)
- adapt k to len in topk (5888)

๐Ÿ› ๏ธ Other improvements

- propagate error in date\_range with invalid time zone (6759)
- update arrow to 0.16 (6748)
- remove unreachable path in write\_anyvalue (6727)
- add groupby\_dynamic to docs (6725)
- chore(rust) disallow chunked datetime with\_time\_zone on tznaive, remove unnecessary with\_time\_zone (6681)
- update Required Rust version to 1.58->1.62 (6680)
- add test for raising error in apply (6664)
- Minor documentation fix (6657)
- Add release flow info to contributing guide (6480)
- address todo and use regex in tz\_aware check (6479)
- Address chrono deprecation warnings (6478)
- fix doc comment in ParallelStrategy (6444)
- move binary to polars-ops (6401)
- fix a typo in csv read example (6389)
- remove roundtrip to builder (6383)
- update rustc to 2023-01-19 (6341)
- run cse optimization only if joins and cachesโ€ฆ (6337)
- update base64 requirement from 0.13 to 0.21 (6249)
- Remove benches and criterion dependency (6273)
- update chrono-tz requirement from 0.6 to 0.8 (6255)
- Enable Dependabot (5036)
- Add missing feature attributes for csv-file (6229)
- don't set aggregated flag on null propagated aggregation. (6191)
- Revert "Use auto\_doc\_cfg" (6164)
- remove vertical take (6112)
- add single threaded sort internally (6103)
- mark `from_chunks` as unsafe (6094)
- replace exact instances of Option/Result combinators (6088)
- ensure reverse indices exist in global string cache (5970)
- refactored describe (5922)
- don't decode into utf8 (5910)
- remove unused deps (5903)

Thank you to all our contributors for making this release possible!
2-5, AnatolyBuga, ChayimFriedman2, MarceColl, MarcoGorelli, MatveyF, abalkin, alexander-beedie, c-peters, cannero, chitralverma, cojmeister, dannyvankooten, dependabot, dependabot[bot], flowlight0, gab23r, gam-phon, ghuls, gitkwr, huitseeker, jgmartin, jjerphan, johngunerli, josh, jvanbuel, n8henrie, ozgrakkurt, papparapa, phaile2, plaflamme, rben01, ritchie46, romanovacca, ropoctl, sorhawell, stinodego, universalmind303, winding-lines, yuntai and zundertj


py-0.16.3
โœจ Enhancements

- add update method to ldf/df (6787)
- accept separator for pivot and to\_dummies (6780)
- feat(rust, python) rename 'tz' to 'time\_zone' in convert\_time\_zone and replace\_time\_zone (6784)
- Allow other expressions for default arg in `map_dict` (6781)
- minor ergonomic affordance; allow `pl.concat` from generator expression (6779)
- rename with\_time\_zone to convert\_time\_zone and cast\_time\_zone to replace\_time\_zone (6768)
- Add `map_dict` expression. (5899)
- support timezone in csv writer (6722)
- default to 1d interval in date\_range (6771)
- parse timezone from Datetime (6766)
- Add option to use PyArrow backed-extension arrays when โ€ฆ (6756)
- formally support duration division (6758)
- add argmin/max for utf8 data (6746)
- Improve numpy support: conversion of numpy arrays with โ€ฆ (6738)
- Improved assert equal messages (6737)
- Support an ignore\_nulls param for EWM calculations. (5749) (6742)
- scan\_ds predicate pushdown for string cmp (6734)
- don't require pyarrow for utf8 -> numpy conversion (6733)
- More ergonomic `with_columns` args (6686)
- feat(python):Add return\_as\_string arg to DF.glimpse; default=False (6678)
- better-characterise `NotFound` exceptions (6670)
- disallow with\_time\_zone from/to tz-naive (6659)
- More ergonomic `select` args (6667)
- let cast\_time\_zone work on tz-naive and deprecate tz-localize (6649)
- improved exceptions on attempt to use invalid schema/dtypes (6653)

๐Ÿž Bug fixes

- always rechunk if n\_chunks > n\_rows (6786)
- fix ndjson empty array parsing (6785)
- make some list expressions aware of groupby context (6776)
- use explicit drop function node (6769)
- don't set sorted flag if we reverse sort the left โ€ฆ (6772)
- handle edge-case with string-literal replacement when the replace value looks like a capture group (6765)
- respect skip\_rows in glob parsing csv (6754)
- Improve error message in DataFrame constructor (6715)
- arrow map dtype conversion (6732)
- respect 'None' in from\_dicts (6726)
- dedicated `rename` implementation. (6688)
- return correct display/repr names for NaN-related expressions (6721)
- strftime with time zone directive (6673)
- typing for `Series` methods that can return `None` (6690)
- ensure that `iter_rows` always returns all values from all chunks/batches in accelerated codepath (6708)
- Support numpy ufunc when expression not first arg (6675)
- Raise ValueError on adding float to Series of dtype date (6677)
- remove uses of rayon global thread pool (6682)
- true-divide output type (6665)
- improve behaviour of dict-expansion (scalars) when mixed with numpy arrays (6663)
- Preserve Expr name in `is_between` (6661)
- Tiny improvement of `Field` repr (6640)

๐Ÿ› ๏ธ Other improvements

- Update `mypy` to version `1.0.0` (6744)
- integrate `ignore_nulls` into EWM parametric tests (6751)
- redirect tz\_localize (6749)
- Reorganize benchmark test folder (6695)
- Split long test modules (namespaces) (6668)
- Use pytest marker for slow tests (6642)
- unify `nan_to_null` and `nan_to_none` parameter names, expose to DataFrame init, add test coverage (6637)
- update `extend_constant` docs/typing (and test coverage) (6646)

Thank you to all our contributors for making this release possible!
AnatolyBuga, MarcoGorelli, MatveyF, alexander-beedie, ghuls, jgmartin, phaile2, plaflamme, ritchie46, sorhawell, stinodego, yuntai and zundertj


py-0.16.2
๐Ÿš€ Performance improvements

- improve dynamic groupby performance on sorted keys (6599)

โœจ Enhancements

- implement fill\_null for list data (6635)
- expression functions should be nullable (6629)
- Implement unary plus operation on exprs and series (6517)
- add streamable udfs (6614)
- is\_first for struct dtype (6595)
- Added from\_str\_radix method to StringNameSpace that allows to parse strings from any base to i32 (6570)
- Implement DataFrame Interchange Protocol through `pyarrow` (6581)
- improve predicate pushdown (6579)
- raise error on invalid binary cmp (6564)

๐Ÿž Bug fixes

- make string\_repr private (6636)
- treat literal values consistently in `select` context, improve related typing (6628)
- Fix \_repr\_html\_ double-height rows (5645) (6534)
- fix(rust, python) cast to and from fixed offsets (6602)
- raise error on string numeric arithmetic (6601)
- don't convert "ns"-precision temporal types via `pyarrow` (6592)
- partially assert sortedness in groupby dynamic (6593)
- fix(rust, python); raise oob if negative index given to take (6590)
- fix predicate pushdown key check (6577)
- fix schema of apply with many inputs on empty df (6571)
- let lhs determine struct order in supertype (6572)
- ensure consistent handling of 1D numpy arrays with respect to other sequences (6569)
- fix(rust, python) validate utc, fmt, and tz-aware in strptime (6550)
- add strptime to filter boundary (6560)

๐Ÿ› ๏ธ Other improvements

- make string\_repr private (6636)
- add example of using `is_between` with string bounds, and extend test coverage for the same (6627)
- provide additional examples for `diff` methods (6630)
- Consistent handling of env vars (6626)
- make `structify` behaviour experimental, while also extending it to aliased expressions (6615)
- Disallow clippy borrow deref ref (6605)
- Update `ruff` version and some settings (6588)
- Add release flow info to contributing guide (6480)
- Use `assert_series_equal` instead of `s.series_equal(...)` (6582)
- cleanup last vestiges of experimental kwargs setting (6568)
- Use `assert_frame_equal` instead of `assert df.frame_equal(...)` (6553)
- Update to PyO3 to 0.18.0 (6531)

Thank you to all our contributors for making this release possible!
2-5, MarcoGorelli, abalkin, alexander-beedie, cojmeister, dependabot, dependabot[bot], jjerphan, plaflamme, ritchie46 and stinodego


py-0.16.1
๐Ÿ† Highlights

- Formalize list aggregation difference between groupbys, selection and window functions (6487)
- automagically upconvert `with_columns` kwarg expressions with multiple output names to struct; extend `**named_kwargs` support to `select` (6497)

โš ๏ธ Breaking changes

- error on string \<-> date cmp (6498)
- Formalize list aggregation difference between groupbys, selection and window functions (6487)
- show where error messages originated (6482)
- Remove deprecated paths from `Series.__getitem__` (6048)
- change behaviour of named rows (6302)
- Remove deprecated `read/write_json` arguments (5990)
- make `schema`, `schema_overrides`, and `orient` consistent on all user-facing interfaces (6387)
- Groupby iteration now returns tuples of (name, data) (6350)
- Remove `Groupby.pivot` (6016)
- Remove deprecated argument aliases (5993)
- Change `Series.shuffle` default behaviour (5991)
- Change `Expr.is_between` default behaviour (5985)
- Restrict certain function parameters to be keyword-only (6464)

โœจ Enhancements

- let cast\_time\_zone accept None (6539)
- automagically upconvert `with_columns` kwarg expressions with multiple output names to struct; extend `**named_kwargs` support to `select` (6497)
- add some missing type annotation in `series` dispatch methods (6523)
- better errors in get\_ptr and a probability on a booleanโ€ฆ (6522)
- add utc parameter to strptime (6496)
- add meta 'has\_multiple\_outputs', 'is\_regex\_projecโ€ฆ (6500)
- error on string \<-> date cmp (6498)
- ~30% faster `iter_rows(named=True)` and `to_dicts()`, if pyarrow available (6493)
- show where error messages originated (6482)
- Remove deprecated paths from `Series.__getitem__` (6048)
- change behaviour of named rows (6302)
- Remove deprecated `read/write_json` arguments (5990)
- Groupby iteration now returns tuples of (name, data) (6350)
- Remove `Groupby.pivot` (6016)
- Remove deprecated argument aliases (5993)
- Change `Series.shuffle` default behaviour (5991)
- Change `Expr.is_between` default behaviour (5985)
- Restrict certain function parameters to be keyword-only (6464)

๐Ÿž Bug fixes

- implement ser/de for BinaryChunked (6543)
- on frame-init from generator, initial `chunk_size` cannot be smaller than `infer_schema_length` (6541)
- raise if tz\_localize called on UTC-aware (6526)
- make concat\_list group aware (6527)
- error on invalid expanding expression (6521)
- create from dicts directly as struct categorical (6520)
- fix oob in arr.get by expressions (6519)
- fix cse schema (6518)
- panic when max\_len -1 is reached (6494)
- Formalize list aggregation difference between groupbys, selection and window functions (6487)
- fix(rust, python) validate tz in with\_time\_zone (6417)

๐Ÿ› ๏ธ Other improvements

- Remove `verify_series_and_expr_api` util (6524)
- Disable some tests for Windows (6532)
- Remove unnecessary brackets in doc examples (6332)
- Enable some tests for Windows (6511)
- Fix test issue with tmp directory (6508)
- Fix some deprecation warnings (6495)
- added all missing examples for temporal expressions (6488)
- Utilize pytest-xdist for faster unittests (6483)
- test(python) I/O test improvements (6475)
- make `schema`, `schema_overrides`, and `orient` consistent on all user-facing interfaces (6387)
- improved error message from Expr on incorrect usage in boolean context (6473)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, gab23r, papparapa, ritchie46, romanovacca, stinodego and zundertj


py-0.15.18
โœจ Enhancements

- More precise pipe type annotation (6457)

๐Ÿž Bug fixes

- use consistent floor division for floats/ints (6460)
- split semi/anti join optimization (6459)

๐Ÿ› ๏ธ Other improvements

- Specify deltalake minimum version (6363)
- deprecate `iterrows` in favour of `iter_rows`, add new `redirect` class decorator (6461)
- Improve IO test structure (6453)

Thank you to all our contributors for making this release possible!
alexander-beedie, josh, ritchie46 and stinodego


py-0.15.17
โœจ Enhancements

- allow expr in str.contains (6443)
- Deprecate `with_column` (6128)
- expose efficient iterator over `DataFrame` slices (6414)
- add float formatting option (6432)
- 10% speedup for `to_dicts` method (6415)
- add datetime/duration dtype selector groups covering the different timeunits (6425)
- allow internal api to get pointer to values buffer (6385)
- infer ISO8601 datetimes (6357)
- minor improvement to auto-detection of ambiguous data orientation (6376)
- allow expressions as arguments in `str.ends_with` (6361)
- Make groupby rolling/dynamic iterable (6372)
- accept expr in `str.starts_with` (6355)
- Move `explode` to namespaces (6351)
- Rename `Series.struct.to_frame` to `.struct.unnest` (6352)
- auto-detect %+ as tz-aware (6434)

๐Ÿž Bug fixes

- fix projection pushdown on double semi join (6440)
- ensure column-exclusion works with the new `dtype` groups, and improve some related typing (6442)
- ensure `from_dicts` and `DataFrame` init from list of dicts behave consistently, update/improve related docstrings (6431)
- cumulative\_eval ensure output dtype is respected (6435)
- allow from pandas null structs (6430)
- fixed interaction of `schema_overrides` with frame-init from list of dicts (6424)
- only use float simd on specific alignment (6427)
- no early escape when window is equal to len in rolling\_float (6408)
- `is_between` typing with time in start and end (6393)
- dont incorrectly infer Zulu time (6378)
- raise error on invalid sort\_by argument (6382)
- take offset into account with str.explode (6384)
- Return empty batch for pl.read\_csv\_batched().next\_โ€ฆ (6381)
- ensure pyarrow.compute module is loaded (6353)
- implement ser/de for StructChunked (6359)
- series of empty structs (6347)

๐Ÿ› ๏ธ Other improvements

- add explicit note about use of `Config` as a context manager (6439)
- ensure `from_dicts` and `DataFrame` init from list of dicts behave consistently, update/improve related docstrings (6431)
- Fix docstring of series.interpolate (6399)
- Remove duplicate test (6390)
- deprecate `columns` param for `DataFrame` init; transitioning to `schema` (6366)
- Add docs and tests to `Expr.flatten` (6370)
- Example of filtering partitioned delta tables (6365)
- Uppercase project URL refs (6362)

Thank you to all our contributors for making this release possible!
ChayimFriedman2, MarcoGorelli, alexander-beedie, c-peters, flowlight0, gab23r, gam-phon, ghuls, jgmartin, josh, ritchie46, romanovacca, stinodego, universalmind303 and zundertj


py-0.15.16
๐Ÿš€ Performance improvements

- Improve rechunk check (6268)
- reuse allocated scratches in ipc writer (6287)
- use dedicated writer thread for sink\_parquet (6285)

โœจ Enhancements

- add strict parameter to decoding expressions (6342)
- allow unordered struct creating from anyvalues (6321)
- allow pass\_name in aggregation apply (6318)
- parse abbrev month name (6314)
- Add warning for new behaviour of named rows (6300)
- add `dt.combine` for combining date and time components (6121)
- improvements to dtype-based column selection (6295)
- add sink\_ipc (6286)
- additional `schema_overrides` param for more ergonomic `DataFrame` init (6230)

๐Ÿž Bug fixes

- don't cast nulls before trying normal cast (6339)
- properly dispatch categorical string comparison (6336)
- expand all nested wildcards in functions (6334)
- fix groupby rolling by\_key if groups are empty (6333)
- Fix some type hints and bugs for groupby (6329)
- Reject `None` input for `head`/`tail` (6326)
- parse abbrev month name (6314)
- default to pyarrow for writing parquet (6313)
- disallow alias in inline join expressions (6312)
- block proj-pd and pred-pd on swapping rename (6303)
- convert nested dictionary with i64 keys (6299)
- fix(python) Print instantiated dtypes in glimpse (6298)
- infer y-m-d datetime even if single element (6297)
- fix panic dynamic\_groupby on empty dataframe (6294)
- implement missing DataFrame `__floordiv__` op (6280)
- Allow low and high in date\_range to be str (6275)
- allow integer-compatible row indexes that are not strictly typed as `int` (6266)
- Parse negative dates with polars parser (6256)

๐Ÿ› ๏ธ Other improvements

- run cse optimization only if joins and cachesโ€ฆ (6337)
- Fix wrong description for variable\_name argument in melt (6331)
- Fix random groupby test failure (6327)
- fixup test names, adjust test\_struct (6317)
- simplify `_from_pandas` constructor (6310)
- Ignore hash doctests (6304)
- Fix docstring formatting for truncate (6291)
- Move package metadata to `pyproject.toml` (6271)
- Move `io` tests to the same folder (6277)
- Enable Dependabot (5036)

Thank you to all our contributors for making this release possible!
MarcoGorelli, alexander-beedie, c-peters, dependabot, dependabot[bot], ghuls, n8henrie, ritchie46, stinodego and universalmind303


py-0.15.15
โœจ Enhancements

- ensure ooc sort works ooc with all-constant values (6235)
- The 1 billion row sort (6156)
- optionally treat missing UTF8 values as the empty string at CSV parse-time (6203)
- check file target is not an existing directory (6187)
- support -ve indexing for DataFrame `head` and `tail` methods (6173)
- Implement `DataFrame.unique(keep="none")` (6169)
- support use of explicit `Struct` dtypes on DataFrame/Series init (6145)

๐Ÿž Bug fixes

- Add list inner dtype when printing Series (6233)
- strptime now respects pl.Datetime's time\_unit (6231)
- fix when then otherwise with arity and aggregationโ€ฆ (6224)
- collect now uses the storage\_options given to scan\_parquet (6223)
- set\_sorted keep schema (6222)
- pass name to value counts in aggregation (6221)
- don't set fast\_explode on list of structs (6220)
- address a frame init/construction error, and expose `infer_schema_length` to frame init (6210)
- explode of empty nullable list (6190)
- fix oob arr.take (6189)
- Make `with_columns` in `with_columns_kwargs` mode compatible with more data types (6126)
- Update docstring `with_columns` to reflect a new dataframe is being returned (6122)
- fix empty streaming joins (6149)
- fix streaming joins where the join order has been โ€ฆ (6143)
- write tz-aware datetimes to csv (6135)
- add null behavior for oob indices (6133)

๐Ÿ› ๏ธ Other improvements

- Create `DataFrame` from schema (6225)
- don't set aggregated flag on null propagated aggregation. (6191)
- undo cargo.toml change (6219)
- Improve drop\_nulls docstrings (6127)
- Clarify docstrings for `closed` argument (6198)
- minor docs and typing updates (plus additional test coverage for related areas) (6182)
- explain n\_field\_strategy (6158)

Thank you to all our contributors for making this release possible!
MarceColl, MarcoGorelli, alexander-beedie, gab23r, ghuls, jvanbuel, n8henrie, rben01, ritchie46, ropoctl, sorhawell, stinodego, winding-lines and zundertj


py-0.15.14
๐Ÿš€ Performance improvements

- first check rev-map on categorical equality check (6085)

โœจ Enhancements

- add `arr.take` expression (6116)
- allow `extend_constant` to work with date literals (6114)
- allow nested categorical cast (6113)
- add a `rounded_corners` modifier to `pl.Config.set_tbl_formatting` (6108)
- huge speedup of scalar-to-array expansion on frame init from dict (6111)
- extend existing fast range->Series init to lists of ranges in a Series (6099)
- additional (opt-in) options for assert\_frame\_equal (6096)
- add search\_sorted for arrays and utf8 dtype (6083)

๐Ÿž Bug fixes

- ensure multi-line type hints are parenthesised (6100)
- fix invalid dtype in chunked array after struct cast (6093)
- don't run cse cache\_states if no projections found (6087)
- Support all datatypes in glimpse and align with head/tail (6091)
- Update `read_csv` error message (6082)
- propogate nulls in binary arithmetic/aggregation (6076)

๐Ÿ› ๏ธ Other improvements

- Fix docstring with\_context (6118)
- Use Dataframe.item internally and in tests (6109)
- Assert deprecation warning on check\_column\_names (6110)
- enable `unused import` autofix via ruff (6102)

Thank you to all our contributors for making this release possible!
alexander-beedie, gitkwr, huitseeker, ritchie46, stinodego and zundertj


py-0.15.13
โœจ Enhancements

- Improve iterating over `GroupBy` (6051)
- much faster lazy type-checks (6064)
- support array-expansion of scalars on frame init from dict (6034)
- improve error message when writing nested data toโ€ฆ (6040)

๐Ÿž Bug fixes

- bound complex type from 3.8 to 3.11 (6071)
- deal with unnest schema expansion in projection pd (6063)
- correct output dtype for cummin/cumsum/cummax (6062)
- block streaming on literal series/range (6058)
- improve handling of dict-type "columns" param on frame-init (6045)
- Fix typing for `DataFrame.select` (6047)
- ndjson struct inference (6049)
- fix stringcache. latest refactor introduced a hashing error (6056)
- allow mixed field order and availability in apply that rโ€ฆ (6041)
- deal with empty structs (6039)
- fix aggregation that filters out all data (6036)
- fix diff overflow (6033)
- keep column names in is\_null/is\_not\_null (6032)
- keep name when sorting categorical in lexial order (6029)
- tweaked property/accessor behaviour (6021)
- properly set null anyvalue if categorical is nesteโ€ฆ (6025)
- Fix `from_epoch` function signature (6024)
- Validate `estimated_size` parameter (6018)

๐Ÿ› ๏ธ Other improvements

- suggest forward fill in cumsum/cummax (6061)
- Fix SIM105 issues. (6042)
- Remove trailing spaces in glimpse output (6037)
- Remove unnecessary noqa's (6035)
- Fix flake8-pytest-style errors in tests. (6031)
- update `read_sql` and `row` docstrings (6028)
- Enable the `isort`-style import autofix via `ruff` (6020)
- Update py-polars/Cargo.lock (6013)
- Refactor pivot tests (6012)
- Use ruff instead of isort, flake8 and pyupgrade (5916)
- Properly deprecate `groupby.pivot` (6000)

Thank you to all our contributors for making this release possible!
alexander-beedie, ghuls, ritchie46, stinodego and universalmind303


py-0.15.11
๐Ÿš€ Performance improvements

- ensure set\_at\_idx is O(1) (5977)

โœจ Enhancements

- allow eq,ne,lt etc (5995)
- Improve `Expr.is_between` API (5981)
- large speedup for `df.iterrows` (~200-400%) (5979)
- updated default table format from "UTF8\_FULL" to "UTF8\_FULL\_CONDENSED" (5967)
- Access rows as namedtuples (5966)
- Improve `assert_frame_equal` messages (5962)

๐Ÿž Bug fixes

- make weekday tz-aware (5989)
- fix categorical in struct anyvalue issue (5987)
- fix invalid boolean simplification (5976)
- allow empty sort on any dtype (5975)
- properly deal with categoricals in streaming queries (5974)

Thank you to all our contributors for making this release possible!
alexander-beedie, ritchie46 and stinodego


py-0.15.9
๐Ÿš€ Performance improvements

- improve reducing window function performance ~33% (5878)

โœจ Enhancements

- `str.strip` with multiple chars (5929)
- add iterrrows (5945)
- read decimal as f64 (5938)
- improve query plan scan formatting (5937)
- allow all null cast (5933)
- allow objects in struct types (5925)
- handle Series init from python sequence of numpy arrays (5918)
- merge sorted dataframes (5817)
- impl hex and base64 for binary (5892)
- Add datatype hierarchy (5901)
- Add .item() on DataFrame and Series (5893)
- make get\_any\_value fallible (5877)
- Add string representation for data types (5861)
- directly push all operator result into sink, prevโ€ฆ (5856)

๐Ÿž Bug fixes

- don't panic on ignored context (5958)
- don't allow named expression in arr.eval (5957)
- error on invalid dtype (5956)
- fix panic in join expressions (5954)
- block ordered predicates before explode (5951)
- adhere to schema in arr.eval of empty list (5947)
- fix from\_dict schema\_inference=0 (5948)
- fix arrow nested null conversion (5946)
- allow None in arr.slice length (5934)
- fix time to duration cast (5932)
- error on addition with datetime/time (5931)
- don't create categoricals in streaming (5926)
- object filter should keep single chunk (5913)
- csv, read escaped "" as missing (5912)
- fix pivot of signed integers (5909)
- don't allow duplicate columns in read\_csv arg (5908)
- fix latest oob in streaming convertion (5902)
- adapt k to len in topk (5888)
- fix lazy swapping rename (5884)
- fix window function with nullable values; regression dueโ€ฆ (5874)
- improve equality consistency between types (5873)
- evaluate whole branch expression to determine if rโ€ฆ (5864)
- fix top\_k on empty (5865)
- fix slice in streaming (5854)
- Fix type hint for IO `*_options` arguments (5852)

๐Ÿ› ๏ธ Other improvements

- Fix docs for sink\_parquet (5952)
- Fix misspelling in LazyFrame docstring (5917)
- add bin, series.is\_sorted and merge\_sorted (5914)

Thank you to all our contributors for making this release possible!
AnatolyBuga, alexander-beedie, cannero, chitralverma, dannyvankooten, johngunerli, ozgrakkurt, ritchie46, stinodego, winding-lines and zundertj


rs-0.26.0
โš ๏ธ Breaking changes

- remove Series::append\_array (5681)
- iso weekday (5598)

๐Ÿš€ Performance improvements

- improve reducing window function performance ~33% (5878)
- impove performance reducing window functions with numeric output `~-14%` (5841)
- set\_sorted flag when creating from literal (5728)
- use sorted fast path in streaming groupby (5727)
- ensure fast\_explode propagates (5676)
- fix quadratic time complexity of groupby in streamโ€ฆ (5614)
- Aggregate projection pushdown (5556)
- improve streaming primitve groupby (5575)
- vectorize integer vec-hash by using very simple, โ€ฆ (5572)
- specialized utf8 groupby in streaming (5535)

โœจ Enhancements

- make get\_any\_value fallible (5877)
- directly push all operator result into sink, prevโ€ฆ (5856)
- add sink\_parquet (5480)
- Support parsing more float string representations. (5824)
- implement mean aggregation for duration (5807)
- implement sensible boolean aggregates (5806)
- allow expression as quantile input (5751)
- accept expression in str.extract\_all (5742)
- tz-aware strptime (5736)
- Add "fmt\_no\_tty" feature for formatting support without rโ€ฆ (5725)
- lazy diagonal concat. (5647)
- to\_struct add upper\_bound (5714)
- inversely scale chunk\_size with thread count in sโ€ฆ (5699)
- add streaming minmax (5693)
- improve dynamic inference of anyvalues and structs (5690)
- support is\_in for boolean dtype (5682)
- add a cache to strptime (5628)
- add nearest interpolation strategy (5626)
- make cast recursive (5596)
- add arg\_min/arg\_max for series of dtype boolean (5592)
- prefer streaming groupby if partitionable (5580)
- make map\_alias fallible (5532)
- pl.min \& pl.max accept wildcard similar to pl.sum (5511)
- add predicate pushdown to anonymous\_scan (5467)
- make streaming work with multiple sinks in a singโ€ฆ (5474)
- add streaming slice operation (5466)
- run partial streaming queries (5464)
- streaming left joins (5456)
- file statistics so we only (try to) keep smallest table in memory (5454)
- streaming inner joins. (5400)
- build\_info() provides detailed information how polars was built (5423)
- add missing `width` property to `LazyFrame` (5431)
- allow regex and wildcard in groupby (5425)
- Streaming joins architecture and Cross join implementation. (5339)
- add support for am/pm notation in parse\_dates read\_csv (5373)
- add reduce/cumreduce expression as an easier fold (5364)

๐Ÿž Bug fixes

- fix lazy swapping rename (5884)
- improve equality consistency between types (5873)
- evaluate whole branch expression to determine if rโ€ฆ (5864)
- fix top\_k on empty (5865)
- fix slice in streaming (5854)
- correct invalid type in struct anyvalue access (5844)
- don't set fast\_explode if null values in list (5838)
- duration formatting (5837)
- respect fetch in union (5836)
- keep f32 dtype in fill\_null by int (5834)
- err on epoch on time dtype (5831)
- fix panic in hmean (5808)
- asof join by logical groups (5805)
- fix parquet regression upstream in arrow2 (5797)
- Fix lazy cumsum and cumprod result types (5792)
- fix nested writer (5777)
- fix(rust, python) Summation on empty series evaluates to `Some(0)` (5773)
- empty concat utf8 (5768)
- projection pushdown with union and asof join (5763)
- check null values in asof\_join + groupby (5756)
- fix generic streaming groupby on logical types (5752)
- fix date\_range on expressions (5750)
- fix dtypes in join\_asof\_by (5746)
- fix group order in binary aggregation (5744)
- implement min/max aggregation for utf8 in groupby (5737)
- fix all\_null/sorted into\_groups panic (5733)
- asof join 'by', 'forward' combination (5720)
- fix pivot on floating point indexes (5704)
- fix arange with column/literal input (5703)
- fix double projection that leads to uneven union dโ€ฆ (5700)
- Fix a bug in floating regex handling used in CSV type inference (5695)
- fix asof join schema (5686)
- fix owned arithmetic schema (5685)
- take glob into account in scan\_csv 'with\_schema\_moโ€ฆ (5683)
- fix boolean schema in agg\_max/min (5678)
- fix boolean arg-max if all equal (5680)
- early error on duplicate names in streaming groupby (5638)
- fix streaming groupby aggregate types (5636)
- convert panic to err in concat\_list (5637)
- fix dot diagram of single nodes (5624)
- fix dynamic struct inference (5619)
- keep dtype when eval on empty list (5597)
- fix ternary with list output on empty frame (5595)
- fix tz-awareness of truncate (5591)
- check chunks before doing chunked\_id join optimizaโ€ฆ (5589)
- invert cast\_time\_zone conversion (5587)
- asof join ensure join column is not dropped when 'โ€ฆ (5585)
- fix ub due to invalid dtype on splitting dfs (5579)
- fix(rust, python); fix projection pushdown in asof joins (5542)
- streaming hstack allow duplicates (5538)
- fix streaming empty join panic (5534)
- fix duplicate caches in cse and prevent quadratic โ€ฆ (5528)
- allow appending categoricals that are all null (5526)
- tz-aware strftime (5525)
- make 'truncate' tz-aware (5522)
- fix coalesce expreession expansion (5521)
- fix nested aggregatin in when then and window exprโ€ฆ (5520)
- fix sort\_by expression if groups already aggregated (5518)
- fix bug in batched parquet reader that dropped dfsโ€ฆ (5506)
- fix bugs in skew and kurtosis (5484)
- compute correct offset for streaming join on multiโ€ฆ (5479)
- return error on invalid sortby expression (5478)
- add missing `AnyValueBuffer` specialisation for `Duration` dtype (5436)
- fix freeze/stall when writing more than 2^31 string values to parquet (5366)
- properly handle json with unclosed strings (5427)
- fix null poisoning in rank operation (5417)
- correct expr::diff dtype for temporal columns (5416)
- fix cse for nested caches (5412)
- don't set sorted flag in argsort (5410)
- explicit nan comparison in min/max agg (5403)
- Correct CSV row indexing (5385)

๐Ÿ› ๏ธ Other improvements

- Update rustc and fix clippy (5880)
- update arrow (5862)
- move join dispatch to polars-ops (5809)
- Remove dbg statement from union (5791)
- Continue removing compilation warnings (5778)
- shrink anyvalue size (5770)
- update arrow (5766)
- chore(rust,python) Change allow\_streaming to streaming (5747)
- remove rev-map from ChunkedArray (5721)
- simplify fast projection by schema (5716)
- Reindent df! docs code (5698)
- remove Series::append\_array (5681)
- Remove unused symbols and uneeded `mut` qualifier (5672)
- Include license files in Rust crates (5675)
- Use `NaiveTime::from_hms_opt` instead of `NaiveTime::from_hms` (5664)
- use xxhash3 for string types (5617)
- iso weekday (5598)
- Improve contributing guide (5558)
- streaming improvements (5541)
- Refer to DataFrame::unique instead of `distinct` (5482)
- don't panic if part of query cannot run streaโ€ฆ (5458)
- make generic join builder more dry (5439)
- use IdHash for streaming groupby generic (5435)
- fix freeze/stall when writing more than 2^31 string values to parquet (5366)

Thank you to all our contributors for making this release possible!
AnatolyBuga, CalOmnie, Kuhlwein, MarcoGorelli, OneRaynyDay, YuRiTan, alexander-beedie, andrewpollack, ankane, braaannigan, chitralverma, dannyvankooten, ghais, ghuls, jjerphan, matteosantama, messense, owrior, pickfire, ritchie46, s1ck, sa-, slonik-az, sorhawell, stinodego, universalmind303 and zundertj


py-0.15.7
๐Ÿš€ Performance improvements

- impove performance reducing window functions with numeric output `~-14%` (5841)

โœจ Enhancements

- allow more pyarrow literals (5842)
- add sink\_parquet (5480)
- release GIL when writing (5830)
- Support parsing more float string representations. (5824)
- implement mean aggregation for duration (5807)
- implement sensible boolean aggregates (5806)

๐Ÿž Bug fixes

- correct invalid type in struct anyvalue access (5844)
- don't set fast\_explode if null values in list (5838)
- duration formatting (5837)
- respect fetch in union (5836)
- keep f32 dtype in fill\_null by int (5834)
- fix(python): fix delta issues (5802)
- err on epoch on time dtype (5831)
- fix panic in hmean (5808)
- asof join by logical groups (5805)

๐Ÿ› ๏ธ Other improvements

- lazily import connectorx (5835)

Thank you to all our contributors for making this release possible!
chitralverma, ghuls and ritchie46


py-0.15.6
๐Ÿž Bug fixes

- fix struct dataset (5798)
- fix parquet regression upstream in arrow2 (5797)

๐Ÿ› ๏ธ Other improvements

- remove unused `cmake-rs` patch (5794)

Thank you to all our contributors for making this release possible!
OneRaynyDay, messense, ritchie46 and universalmind303


py-0.15.3
๐Ÿš€ Performance improvements

- set\_sorted flag when creating from literal (5728)
- use sorted fast path in streaming groupby (5727)

โœจ Enhancements

- push down predicates to pyarrow datasets (5780)
- Support for reading delta lake tables (5761)
- Add DataFrame.glimpse() (5622)
- allow expression as quantile input (5751)
- accept expression in str.extract\_all (5742)
- tz-aware strptime (5736)
- lazy diagonal concat. (5647)
- to\_struct add upper\_bound (5714)

๐Ÿž Bug fixes

- fix(rust, python) Summation on empty series evaluates to `Some(0)` (5773)
- empty concat utf8 (5768)
- projection pushdown with union and asof join (5763)
- check null values in asof\_join + groupby (5756)
- fix generic streaming groupby on logical types (5752)
- fix date\_range on expressions (5750)
- fix dtypes in join\_asof\_by (5746)
- fix group order in binary aggregation (5744)
- implement min/max aggregation for utf8 in groupby (5737)
- fix all\_null/sorted into\_groups panic (5733)
- address several edge-cases found when asserting NaN equality (5732)
- asof join 'by', 'forward' combination (5720)

๐Ÿ› ๏ธ Other improvements

- add DataFrame.pearson\_corr to reference (5772)
- Parse fixed timezone offsets without pytz (5769)
- chore(rust,python) Change allow\_streaming to streaming (5747)
- Remove pyarrow nightlies requirement. (5719)
- fix incorrect accepted type in df.write\_csv (5715)

Thank you to all our contributors for making this release possible!
AnatolyBuga, MarcoGorelli, alexander-beedie, andrewpollack, braaannigan, chitralverma, ghuls, ritchie46, sa- and zundertj


py-0.15.2
๐Ÿš€ Performance improvements

- ensure fast\_explode propagates (5676)

โœจ Enhancements

- Series.get\_chunks (5701)
- inversely scale chunk\_size with thread count in sโ€ฆ (5699)
- add streaming minmax (5693)
- Support large page sizes on aarch64 linux builds (5694)
- improve dynamic inference of anyvalues and structs (5690)
- support is\_in for boolean dtype (5682)
- add notebook html repr for Series (5653)

๐Ÿž Bug fixes

- fix pivot on floating point indexes (5704)
- fix arange with column/literal input (5703)
- fix double projection that leads to uneven union dโ€ฆ (5700)
- Fix Series -> Expr dispatch for property methods (5689)
- fix asof join schema (5686)
- fix owned arithmetic schema (5685)
- take glob into account in scan\_csv 'with\_schema\_moโ€ฆ (5683)
- fix boolean schema in agg\_max/min (5678)
- fix boolean arg-max if all equal (5680)
- respect python objects read method even if filename is fโ€ฆ (5677)
- Fix `DataFrame.n_chunks` return type (5650)

๐Ÿ› ๏ธ Other improvements

- Parametrize `test_parquet_datetime` (5696)
- Function and lazy function doctrings (5657)
- Fix formatting (5658)

Thank you to all our contributors for making this release possible!
alexander-beedie, ankane, braaannigan, ghais, ghuls, jjerphan, pickfire, ritchie46, stinodego and zundertj


py-0.15.1
โš ๏ธ Breaking changes

- Update `Expr.sample` signature and change random seeding (4648)
- rollup breaking changes (5602)
- iso weekday (5598)
- Change `null_equal` default to `True` for `Series.series_equal` (5051)
- rollup breaking changes (5602)

๐Ÿš€ Performance improvements

- fix quadratic time complexity of groupby in streamโ€ฆ (5614)
- Improve performance of indexing operations on Series. (5610)
- Aggregate projection pushdown (5556)

โœจ Enhancements

- add a cache to strptime (5628)
- add nearest interpolation strategy (5626)
- Update `Expr.sample` signature and change random seeding (4648)
- Change `null_equal` default to `True` for `Series.series_equal` (5051)
- make cast recursive (5596)
- add arg\_min/arg\_max for series of dtype boolean (5592)

๐Ÿž Bug fixes

- early error on duplicate names in streaming groupby (5638)
- fix streaming groupby aggregate types (5636)
- convert panic to err in concat\_list (5637)
- fix dot diagram of single nodes (5624)
- fix dynamic struct inference (5619)
- tz-aware filtering (5603)
- keep dtype when eval on empty list (5597)
- fix ternary with list output on empty frame (5595)
- fix tz-awareness of truncate (5591)
- check chunks before doing chunked\_id join optimizaโ€ฆ (5589)
- invert cast\_time\_zone conversion (5587)
- asof join ensure join column is not dropped when 'โ€ฆ (5585)

๐Ÿ› ๏ธ Other improvements

- Remaining docstring examples for frame and lazyframe (5630)
- use xxhash3 for string types (5617)
- only trigger build.rs file if that file itself has chaโ€ฆ (5618)
- iso weekday (5598)
- Merge release workflows (5564)
- Fix broken lint workflow (5584)

Thank you to all our contributors for making this release possible!
Kuhlwein, braaannigan, ghuls, matteosantama, ritchie46 and stinodego


py-0.14.31
๐Ÿš€ Performance improvements

- improve streaming primitve groupby (5575)
- vectorize integer vec-hash by using very simple, โ€ฆ (5572)

โœจ Enhancements

- prefer streaming groupby if partitionable (5580)

๐Ÿž Bug fixes

- fix ub due to invalid dtype on splitting dfs (5579)

๐Ÿ› ๏ธ Other improvements

- Remove old Python changelog file (5577)
- namespace registration docs update (5565)
- Improve contributing guide (5558)

Thank you to all our contributors for making this release possible!
alexander-beedie, ghuls, ritchie46 and stinodego


py-0.14.29
๐Ÿš€ Performance improvements

- specialized utf8 groupby in streaming (5535)

โœจ Enhancements

- add dataframe.pearson\_corr (5533)
- support namespace registration (5531)
- make map\_alias fallible (5532)
- pl.min \& pl.max accept wildcard similar to pl.sum (5511)
- additional support for using `timedelta` with duration-type arguments (5487)

๐Ÿž Bug fixes

- fix(rust, python); fix projection pushdown in asof joins (5542)
- streaming hstack allow duplicates (5538)
- fix streaming empty join panic (5534)
- fix duplicate caches in cse and prevent quadratic โ€ฆ (5528)
- allow appending categoricals that are all null (5526)
- tz-aware strftime (5525)
- make 'truncate' tz-aware (5522)
- fix coalesce expreession expansion (5521)
- fix nested aggregatin in when then and window exprโ€ฆ (5520)
- fix sort\_by expression if groups already aggregated (5518)
- fix bug in batched parquet reader that dropped dfsโ€ฆ (5506)
- preserve `Series` name when exporting to `pandas` (5498)
- Refactor is\_between (5491)
- fix bugs in skew and kurtosis (5484)

๐Ÿ› ๏ธ Other improvements

- support tabbed panels in sphinx, add namespace docs (5540)
- Update dev dependencies (5517)

Thank you to all our contributors for making this release possible!
alexander-beedie, braaannigan, ghuls, ritchie46, sorhawell and zundertj


py-0.14.27
โœจ Enhancements

- additional autocomplete affordances for `IPython` users (5477)
- make streaming work with multiple sinks in a singโ€ฆ (5474)
- add streaming slice operation (5466)
- run partial streaming queries (5464)
- streaming left joins (5456)
- file statistics so we only (try to) keep smallest table in memory (5454)
- streaming inner joins. (5400)

๐Ÿž Bug fixes

- compute correct offset for streaming join on multiโ€ฆ (5479)
- return error on invalid sortby expression (5478)
- use json for expr pickle (5476)
- improved namespace/accessor behaviour (resolves VSCode autocomplete issue) (5469)
- further improved lazy loading (5459)
- fix for categorical inserts from row-oriented data (5462)
- use of `fill_null` with temporal literals (5440)

๐Ÿ› ๏ธ Other improvements

- don't panic if part of query cannot run streaโ€ฆ (5458)
- add build\_info() to the API doc (5442)
- Improved structure for `DataFrame` and `LazyFrame` API docs, misc design improvements (5433)

Thank you to all our contributors for making this release possible!
alexander-beedie, dannyvankooten, ritchie46, s1ck, slonik-az, stinodego and universalmind303


py-0.14.26
โœจ Enhancements

- build\_info() provides detailed information how polars was built (5423)
- add missing `width` property to `LazyFrame` (5431)
- enhanced `Series.dot` method and related interop (5428)
- allow regex and wildcard in groupby (5425)
- support `DataFrame` init from generators (5424)
- support `Series` init from generator (5411)

๐Ÿž Bug fixes

- fix freeze/stall when writing more than 2^31 string values to parquet (5366)
- properly handle json with unclosed strings (5427)
- fix null poisoning in rank operation (5417)
- correct expr::diff dtype for temporal columns (5416)
- fix cse for nested caches (5412)
- don't set sorted flag in argsort (5410)

๐Ÿ› ๏ธ Other improvements

- Fix dependencies on memory allocator (5426)
- Better docstring for keep\_name (5378) (5421)

Thank you to all our contributors for making this release possible!
CalOmnie, alexander-beedie, ghuls, ritchie46, slonik-az, stinodego and universalmind303


py-0.14.25
โœจ Enhancements

- 30x speedup initialising `Series` from python `range` object (5397)
- r-associative support for commutative `DataFrame` operators (5394)
- pl.from\_epoch function (5330)
- Streaming joins architecture and Cross join implementation. (5339)
- enable frame init from sequence of pandas series, and improve lazy typechecks (handle subclasses) (5383)
- add support for am/pm notation in parse\_dates read\_csv (5373)
- add reduce/cumreduce expression as an easier fold (5364)

๐Ÿž Bug fixes

- explicit nan comparison in min/max agg (5403)
- lazy proxy module does not require global registration (5390)
- Correct CSV row indexing (5385)

๐Ÿ› ๏ธ Other improvements

- Docstrings for frame, lazyframe and time series (5398)
- add integrated support for copying API examples, and auto-parallelise docs build (5393)
- improve rendering of API docs type signatures, mark PivotOps as deprecated, misc tidy-ups (5388)
- Expression docstrings (5377)
- minor navbar improvements; adds discord and twitter links, fixes github icon (5379)
- improve structure of sphinx-generated API docs (5376)
- Add with\_time\_zone to reference guide (5369)

Thank you to all our contributors for making this release possible!
YuRiTan, alexander-beedie, braaannigan, owrior, ritchie46 and zundertj


rs-0.25.0
Most notable mention this release is the start of **Out Of Core** support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from `csv` or `parquet` and are limited to `select`, `filter`, and `groupby` operations. Many more operations will follow in next releases.

See https://github.com/pola-rs/polars/pull/5139#issuecomment-1274687634 where we were able to process a **80GB** dataset on a laptop with only 16GB RAM.

Thanks to everyone who contributed to another release! :raised_hands:

โš ๏ธ Breaking changes

- rename expand\_at\_index -> new\_from\_index (5259)

๐Ÿš€ Performance improvements

- lower contention in out of core filter (5311)
- improve pivot performance by using faster seriesโ€ฆ (5172)
- improve streaming performance (~15%) (5170)
- don't block projection pushdown on unnest (5123)
- more conservative JIT sort settings (5080)
- sort and unsort join key if other side is sorted (5069)
- do not rechunk left joins (5066)
- Prune unneeded projections (5032)
- Improve predicate pushdown + with\_columns (5029)
- Don't execute unused with\_column expressions (5026)

โœจ Enhancements

- shrink\_type expression (5351)
- tz\_localize expression (5340)
- accept expr in arr.get (5337)
- Implement forward strategy in groupby join\_asof (5335)
- improve dynamic inference of struct types (5297)
- Add newline to Aggregate..FROM describe\_optimization\_plan (5253)
- date\_range expression (5267)
- show expression where error originated if raised โ€ฆ (5263)
- improve error msg if window expressions length doโ€ฆ (5262)
- Add round for date and datetime (5153)
- new `n_chars` functionality for utf8 strings (5252)
- added new `Config` formatting option `set_tbl_column_data_type_inline`, fixed reading of env vars, improved interaction between formatting options (5243)
- make date\_range timezone aware (5234)
- Rust functions for typed JsonPath implementation (5140)
- allow polars Config options to be serialised/shared, and more easily unset (5219)
- batched csv reader (5212)
- accept expressions in arr.slice (5191)
- is\_sorted aggregation fast path for Utf8Chunked (5184)
- hybrid streaming query engine (5139)
- add binary dtype (5122)
- improve function expansion (5110)
- add struct arithmetics (5107)
- add cumfold/cumsum expression (5103)
- error on invalid asof join inputs (5100)
- small plan and profile chart improvements (5067)
- Initial implementation of histogram algorithm (4752)

๐Ÿž Bug fixes

- unnest only pushdown column if there are projections (5360)
- block is\_null predicate in asof join (5358)
- ensure that no-projection is seen as select all inโ€ฆ (5356)
- resolve duplicated column names in pivot (5349)
- fix serde of expression (pickle) (5333)
- don't set auto-explode in apply\_multiple (5265)
- export anonymousscan in lazy prelude (5295)
- fix explicit list + sort aggregation in groupby coโ€ฆ (5317)
- fix sort-merge dispatch of utf8 (5315)
- properly interpret FMT\_MAX\_ROWS - remove arbitrary minimum, fix Series formatting (5281)
- don't block non matching groups in binary expression (5273)
- fix logical type of nested take (5271)
- tag IntoSeries trait as unsafe (5258)
- include single null value in global cat builder (5254)
- include slice in sort fast path (5247)
- determine supertype of datetimes with timezones anโ€ฆ (5240)
- fix groupby dynamic truncate for > days resolution (5235)
- set timezone on groupby\_dynamic boundaries (5233)
- fix incorrect duration dtype (5226)
- set string cache if lazy schema contains categorical (5225)
- fix pipeline dtypes (5224)
- fix asof\_join schema (5213)
- fix single thread loop if schema lenght is off by 1 (5210)
- improve numeric stability of rolling\_variance (5207)
- fix overflow in partitioned groupby mean of int32/โ€ฆ (5204)
- don't allow categorical append that is not under sโ€ฆ (5195)
- include offset in arr.get (5193)
- fix rolling\_float in case closure returns None (5180)
- Implement missing `extract` conversion for `Time` datatype (5161)
- implement missing conversion to python `time` object (5152)
- microsecond noise on `date` >> `time` cast (add `00:00:00` fast-path) (5149)
- wrong operator mapped for LtEq (5120)
- unique include null (5112)
- don't recurse assign uniuns as it SO > 5k files (5098)
- block projection pushdown on unnest (5093)
- projection\_node always do projection locally if noโ€ฆ (5090)
- fix iso\_year for Date dtype (5074)
- fix bug in unneeded projection pruning (5071)
- Improve printing controls of DataFrame and Series (5047)
- Double projections should be checked on input schema (5058)
- Apply flat overlapping row groups when possible (5039)
- Ensure all predicates use same key function when insertingโ€ฆ (5034)
- Only consider dt series equal if they have the same tz (5025)
- Special-case `ewm_mean(alpha=1)` (5019)
- Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (5014)
- Fix timezone cast (5016)

๐Ÿ› ๏ธ Other improvements

- update to rustc to nightly-2022-10-24 (5312)
- update ahash and add nightly features of hashbrown (5310)
- Update comfy-table and memchr. (5276)
- rename expand\_at\_index -> new\_from\_index (5259)
- ensure streaming groupby take slice into account (5178)
- move polars-sql under polars folder (5176)
- remove aggregate pushdown optimization (5173)
- relax sync requirement on Executor trait impls (5142)
- Get rid of unnecessary check in SplitLines iterator (5141)
- Constant instead of literal (5088)
- Use `release-drafter` to draft releases with changelogs (5033)
- Fix docs by activating docfg feature (5028)
- Split up `polars-lazy` crate. (5020)

Thank you to all our contributors for making this release possible!
AlecZorab, YuRiTan, alexander-beedie, cjermain, dannyvankooten, dpatton-gr, egorchakov, ghuls, hpux735, matteosantama, mcrumiller, owrior, ritchie46, slonik-az, sorhawell, stinodego, thatlittleboy, universalmind303 and zundertj


py-0.14.24
โœจ Enhancements

- shrink\_type expression (5351)
- don't raise error but print a warning if mp fork methodโ€ฆ (5342)
- tz\_localize expression (5340)
- accept expr in arr.get (5337)
- Implement forward strategy in groupby join\_asof (5335)

๐Ÿž Bug fixes

- unnest only pushdown column if there are projections (5360)
- block is\_null predicate in asof join (5358)
- ensure that no-projection is seen as select all inโ€ฆ (5356)
- resolve duplicated column names in pivot (5349)
- remove unused branch in getitem (5348)
- nested dicts / list generation (5336)
- fix serde of expression (pickle) (5333)
- handle old-style module loaders such that we can still lazy load them (5331)
- explicit output type in apply (5328)

๐Ÿ› ๏ธ Other improvements

- remove multiprocessing check, and leave it to the user (5347)
- Update dev, lint and docs dependencies (5338)
- lazy module proxy (obviate attribute access guards for missing modules) (5320)

Thank you to all our contributors for making this release possible!
AlecZorab, alexander-beedie, ghuls and ritchie46


py-0.14.23
๐Ÿž Bug fixes

- fix explicit list + sort aggregation in groupby coโ€ฆ (5317)
- fix sort-merge dispatch of utf8 (5315)
- close multi-threading pool in df creation (5309)
- fix and check all uninstalled imports in ci (5304)

๐Ÿ› ๏ธ Other improvements

- Add "import polars.testing" to testing docstrings (5316) (5318)
- streamline lazy imports (5302)
- Catch deprecation warnings in unit tests (5306)
- fix and check all uninstalled imports in ci (5304)

Thank you to all our contributors for making this release possible!
alexander-beedie, ghuls, ritchie46, thatlittleboy, universalmind303 and zundertj


py-0.14.22
๐Ÿš€ Performance improvements

- Make all expensive imports lazy `- ~85%` (5287)
- remove pandas imports (5286)
- never import hypothesis in user code (5282)

โœจ Enhancements

- expose to\_struct to series list namespace (5298)
- improve dynamic inference of struct types (5297)
- don't panic in failing apply (5294)
- improve error message in struct apply (5291)
- accept schema in read\_dicts (5290)
- Do not import polars.testing by default (5284)
- Pass more options to pyarrow in write\_parquet (5278) (5280)
- date\_range expression (5267)
- allow implicit None branch in when then otherwise (5264)
- show expression where error originated if raised โ€ฆ (5263)
- improve error msg if window expressions length doโ€ฆ (5262)
- pl.ones, pl.zeros and Series.new\_from\_index functions (5260)
- Add round for date and datetime (5153)
- new `n_chars` functionality for utf8 strings (5252)
- added new `Config` formatting option `set_tbl_column_data_type_inline`, fixed reading of env vars, improved interaction between formatting options (5243)

๐Ÿž Bug fixes

- throw error on invalid lazy concat strategy (5292)
- fix to\_pandas edge case (5293)
- properly interpret FMT\_MAX\_ROWS - remove arbitrary minimum, fix Series formatting (5281)
- respect schema overwrite in from rows (5275)
- don't block non matching groups in binary expression (5273)
- fix logical type of nested take (5271)
- Check if `BatchedCsvReader.next_batches()` is None beforโ€ฆ (5256)
- include single null value in global cat builder (5254)
- Check multiprocessing start\_method on import (3144) (5237)

๐Ÿ› ๏ธ Other improvements

- Add ModuleType for import functions in import\_check.py (5289)

Thank you to all our contributors for making this release possible!
alexander-beedie, ghuls, owrior and ritchie46


py-0.14.21
๐Ÿž Bug fixes

- include slice in sort fast path (5247)
- don't use zoneinfo globally (5246)

Thank you to all our contributors for making this release possible!
ritchie46


py-0.14.20
โœจ Enhancements

- make date\_range timezone aware (5234)
- infer timezone and improve display (5232)
- allow Config to be used as a context manager, and update some docs (5223)
- allow polars Config options to be serialised/shared, and more easily unset (5219)

๐Ÿž Bug fixes

- determine supertype of datetimes with timezones anโ€ฆ (5240)
- fix groupby dynamic truncate for > days resolution (5235)
- ensure that `polars_type_to_constructor` works with tz-aware `Datetime` dtypes (5239)
- set timezone on groupby\_dynamic boundaries (5233)
- accept `tuple[bool, bool]` instead of `Sequence[bool]` for `Expr.is_between` (5094)
- fix incorrect duration dtype (5226)
- set string cache if lazy schema contains categorical (5225)
- fix pipeline dtypes (5224)

๐Ÿ› ๏ธ Other improvements

- update lazyframe lazygroupby apply docstring (5238)
- Consistent naming for Python release workflow (5229)

Thank you to all our contributors for making this release possible!
YuRiTan, alexander-beedie, cjermain, matteosantama, ritchie46 and stinodego


py-0.14.19
๐Ÿš€ Performance improvements

- improve pivot performance by using faster seriesโ€ฆ (5172)
- improve streaming performance (~15%) (5170)
- don't block projection pushdown on unnest (5123)

โœจ Enhancements

- batched csv reader (5212)
- accept expressions in arr.slice (5191)
- is\_sorted aggregation fast path for Utf8Chunked (5184)
- support `DataFrame` init with Datetime dtypes that specify a timezone (5174)
- frame-level `n_unique()` that can count unique rows or col/expr subsets (5165)
- hybrid streaming query engine (5139)
- return Datetime/Duration with appropriate timeunit when inferring from pytype (5127)
- add binary dtype (5122)

๐Ÿž Bug fixes

- fix asof\_join schema (5213)
- fix single thread loop if schema lenght is off by 1 (5210)
- improve numeric stability of rolling\_variance (5207)
- fix apply function over object dtype (5206)
- fix overflow in partitioned groupby mean of int32/โ€ฆ (5204)
- don't allow categorical append that is not under sโ€ฆ (5195)
- include offset in arr.get (5193)
- DataFrame.fill\_null include unsigned integers (5192)
- error on fill\_nan on non float dtype (5185)
- infer missing columns in from\_dicts (5183)
- fix rolling\_float in case closure returns None (5180)
- Implement missing `extract` conversion for `Time` datatype (5161)
- implement missing conversion to python `time` object (5152)
- Rendering long docstring lines. (5150)
- add missing \_NUMPY\_AVAILABLE check in Series.\_\_getitem\_\_ (5126)
- wrong operator mapped for LtEq (5120)

๐Ÿ› ๏ธ Other improvements

- skip failing test until 5177 is resolved (5205)
- ensure streaming groupby take slice into account (5178)
- remove aggregate pushdown optimization (5173)
- Add support for ruff python linter. (5151)
- improve typing; many `list` types are better defined as `Sequence` (5164)
- Get rid of unnecessary check in SplitLines iterator (5141)

Thank you to all our contributors for making this release possible!
alexander-beedie, dannyvankooten, ghuls, ritchie46 and sorhawell


py-0.14.18
๐Ÿš€ Performance improvements

- take advantage of sorted join for frame alignment (5106)

โœจ Enhancements

- improve function expansion (5110)
- add struct arithmetics (5107)
- add cumfold/cumsum expression (5103)
- error on invalid asof join inputs (5100)

๐Ÿž Bug fixes

- unique include null (5112)
- don't recurse assign uniuns as it SO > 5k files (5098)
- block projection pushdown on unnest (5093)
- projection\_node always do projection locally if noโ€ฆ (5090)

๐Ÿ› ๏ธ Other improvements

- deprecate name argument in drop (5099)
- improve py-polars/Makefile (5089)

Thank you to all our contributors for making this release possible!
alexander-beedie, owrior, ritchie46 and slonik-az


py-0.14.17
๐Ÿš€ Performance improvements

- more conservative JIT sort settings (5080)

Thank you to all our contributors for making this release possible!
mcrumiller, ritchie46 and zundertj


py-0.14.16
๐Ÿš€ Performance improvements

- sort and unsort join key if other side is sorted (5069)
- do not rechunk left joins (5066)

โœจ Enhancements

- deprecate boolean mask for Series indexing (5075)
- small plan and profile chart improvements (5067)
- add gantt chart plot to LazyFrame::profile (5063)
- Support `Series` init as struct from `dataclass` and annotated `NamedTuple` (5057)

๐Ÿž Bug fixes

- fix iso\_year for Date dtype (5074)
- tz-aware get\_idx (5072)
- Fix empty method detection when PYTHONOPTIMIZE=2 (5043)
- fix bug in unneeded projection pruning (5071)
- remove overloads for `from_arrow` (5065)
- Improve printing controls of DataFrame and Series (5047)
- Double projections should be checked on input schema (5058)
- Add missing cse param to LazyFrame "profile" method (5054)

๐Ÿ› ๏ธ Other improvements

- Default to zstd parquet compression (5060)
- Refactor `show_graph` (5059)
- Use `release-drafter` to draft releases with changelogs (5033)
- Update Makefile (5056)
- Parametric test coverage for EWM functions (5011)

Thank you to all our contributors for making this release possible!
alexander-beedie, egorchakov, matteosantama, ritchie46, slonik-az, stinodego and zundertj


rs-0.24.3

Links

Releases

ยฉ 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.