This preview release of DuckDB is named "Pulchellus" after the [Green pygmy goose (Nettapus pulchellus)](https://en.wikipedia.org/wiki/Green_pygmy_goose) which is native to Australia where [VLDB 2022](https://vldb.org/2022/) is starting today. Despite being called a "goose" it is actually a duck.
Binary builds are listed at the bottom of this post. Feedback is very welcome.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the `EXPORT DATABASE` command with the old version followed by `IMPORT DATABASE` with the new version to migrate your data. See the [documentation](https://duckdb.org/docs/sql/statements/export) for details.
Below a list of changes in this release
Major Changes & Features
- 4189: Implement Out-of-Core Hash Join and Re-Work Query Verification
- 4022: Art Index Storage
- 4274: Join Order Optimizer improvements
- 4420: Logical Plan Serialization
- 4137, 4347, 4293, 4190, 4178, 4177, 3954 & 4159: Scalability and performance improvements for Window operator
- 4004: Add support for extensions to the parser, and add an example of this to the loadable extension demo
- 4089: Signed Extensions
- 4097 & 4211: Filename column + Hive partitioning support for Parquet Reader
- 4501, 4511: Aarch64 Linux builds of CLI, shared library, JDBC & ODBC
Minor Changes & Bug Fixes
- 4594: [Map] Fix map_extract from multiple rows
- 4585: Fix for r test instability, 4549
- 4560: Support all basic integer types in node API
- 4558: [CPP-API] Comment no longer causes crash
- 4552: [Fuzzer] Issue 4152 - Remove ToString roundtrip in query verification
- 4543: Fixing silent assertions
- 4542: Check if database is still alive when trying to connect for nodejs
- 4541: fix for issue 4533
- 4539: Paralelization non-dependent on Arrow rows
- 4524: Explicitly deleting default connection on js side
- 4522: Correct architecture name for Linux aarch64
- 4521: Adding correct substrait release tag to out-of-tree extension deployment
- 4520: Added test cases for several fixed JDBC issues
- 4516: Fix 4455, dont set default schema in transform
- 4513: Issue 4502
- 4510: [Casting] Varchar -> Decimal cast fix
- 4507: [CSV] Fixed bug related to invalidated iterators
- 4505: extension trigger event
- 4504: fix: short-circuit hash and version discovery
- 4496: [Fuzzer] Issue 4152 - Force no cross-product issue
- 4495: Build ODBC driver binary for OSX
- 4494: [Fuzzer] Issue 4152 - Analyze inexisting column
- 4493: Declare all variables for nodejs.
- 4491: Issue 4419: Range Join Swizzling
- 4488: Making the parquet extension loadable
- 4484: fix: ignore status message from output of mypy stubs check
- 4483: [Development bug] unittest result_helper.cpp triggers assertion
- 4480: Remove REST server
- 4479: Remove assertion
- 4477: Removing Substrait From DuckDB Repo
- 4474: WIP 4152
- 4472: [Python] Removed mutable default parameters
- 4470: Fix hidden merge conflict with fetchmany
- 4465: [Python] `fetchmany` implemented
- 4458: Issue 4454: VARCHAR/DATE Reversibility
- 4448: Issue 3954: Pinned Heap Blocks
- 4440: Added support for HUGEINT input type to BIT_COUNT scalar function
- 4434: Python: Add PyRelation.fetchnumpy()
- 4429: Allow indicating a format version that should be used to write/read from (De)serializer and use it for plans
- 4427: Python: Improve docstrings for DuckDBPyRelation and DuckDBPyResult
- 4418: Fix typo
- 4416: Fix several update issues
- 4413: Correctly schedule mix of union/child pipelines (again)
- 4409: Increase timeout for coverage checks
- 4405: Hybrid ART Leaf Part I
- 4404: Add support for TS_MS, TS_NS, and TS_S
- 4400: Issue 4388: DATE_TRUNC Low Precision
- 4398: fix: correct object return types for arrow functions
- 4395: Fix name of environment variable
- 4390: Support UNION BY NAME set operation
- 4383: Missing LISTs are NULL
- 4382: Include PID in test directory name
- 4380: R: Avoid `translate_duckdb()` in tests
- 4377: R: Full BLOB support
- 4372: Fix 4370: correctly handle non-flat vectors in list_sort
- 4371: [Python] Changed all RuntimeErrors thrown in the Python client
- 4368: Fixes issue 4365 - Not null constraint is no longer duplicated
- 4364: Allow extra parameters in list_aggr to be passed in, as long as they are constant and only used during the bind
- 4363: Fix for array_position with NaNs: use Equals::Operation instead of regular equality
- 4362: Allow table functions to set cardinality stats through the C API - and utilize this in Julia DataFrame scans
- 4359: Mark slow tests
- 4355: Fix typo in exception text
- 4354: R: Use preinstalled symbol
- 4353: Shell: Add missing newline in help output
- 4352: Tweak contributing guide [ci skip]
- 4345: [Substrait] Pushing-down projections and filters to read relation
- 4340: Correctly schedule pipeline dependencies when scheduling mix of UNION and FULL OUTER JOINs
- 4336: feat: add basic json support to jdbc client
- 4334: Bring ibis/substrait tests to a sane state
- 4332: Fix Julia parallelism interleaving with the garbage collector, and expose Pending Query Result in C interface
- 4328: Allow specifying a custom home directory using the SET home_directory option
- 4327: [Aggregate] DISTINCT aggregates without GROUP BY are now executed in parallel
- 4324: Fix 4309: fix for multiple foreign key constraints on the same table-table pair
- 4323: Optimizer profiling
- 4322: Print NOT operator correctly
- 4319: feat: add missing node versions to CI
- 4317: refactor: remove dead code in python client
- 4316: R: Add rlang as suggested dependency
- 4315: Column Data Collection, Arrow Result conversion rework, Cross Product performance fixes & more
- 4312: R: Install tidy CLI tool
- 4310: R: Add test for `test_all_types()`
- 4304: Improve numeric hash function to a better but slightly slower hash function
- 4301: Add unit of measurement in timer function
- 4300: Support root type on expressions 4278
- 4298: Feature/nodejs client docs
- 4297: fix: remove nodejs test focus
- 4296: Avoid infinite loop in range(NULL)
- 4294: 4276 Serializing data types on table schema in substrait
- 4289: [Python/Pandas] fix +/- inf wrongly converting to NaN (NULL)
- 4288: Fix fuzzer issue w.r.t. NULL values in generate_series
- 4286: [Python - Relation] CreateView on a filtered relation does not cause infinite loop anymore
- 4285: chore: remove cython constraint now that bug is fixed
- 4284: Pandas timezone
- 4283: Return errors from RecordBatchReader
- 4280: R: Remove nycflights13 dependency
- 4279: R: Don't export duckdb_explain()
- 4277: feat: update setup.py links
- 4272: Allow 0 as a seed parameter
- 4266: R: Only quote non-syntactic and reserved words
- 4265: Specialize LIST aggregate function implementation
- 4263: R: Avoid attaching package during tests
- 4259: Add ANY_VALUE agg function
- 4256: Schedule child pipeline correctly
- 4255: Disable ibis substrait tests for now
- 4250: C API: Report appender error in case conversion fails
- 4240: DELIM_JOIN now propagate statistics correctly
- 4237: fix: pin cython to work around bug
- 4236: Integer types now correctly increase `width` of DECIMAL type.
- 4235: Parquet writer: Write dictionary_page_offset, and distinct_count for dictionary encoded strings/enum
- 4234: Implement json_merge_patch and jsonlines output mode
- 4233: feat: fix pandas types in docstrings/python types
- 4230: Handle nulls in structs and lists
- 4225: Add Jaro Winkler
- 4215: Use right template for smallint
- 4213: feat: update instructions for installing master builds in bug report template
- 4212: Improve error message
- 4210: PARQUET: Move StringColumnWriter dictionary to use string_t to avoid allocations
- 4209: Remove unused PhysicalTypes
- 4207: Disable GC during Julia execution to avoid internal GC deadlock in DataFrame scan
- 4206: Fix 4202: in the comparison simplification optimizer, we can only shift the cast to the constant if both casts are invertible
- 4199: feat: Use pip to install and uninstall python client
- 4198: [capi] impl clear bindings for prepared stmt
- 4197: feat: port bug_report.md to bug_report.yml
- 4196: Fix RTTI issue across extension boundaries on OSX
- 4192: Correctly call SetFilePointerEx on Windows so the truncate works as expected
- 4191: Fix Expanded CI test case by adding swap space to test
- 4188: ALTER SEQUENCE IF EXISTS fix
- 4187: [Storage] FOR compression
- 4185: ISSUE 3248 Support for ALTER TABLE altering columns NOT NULL
- 4183: Julia multi-threading fix: avoid using a time-out to cancel threads in case there are no tasks
- 4179: node: add async-iterator-based streaming
- 4175: [CI] Python Build with Sanitizer
- 4172: Update stubs test
- 4168: Issue 4161: Create WindowExecutor
- 4167: node: report memory usage to the node GC
- 4166: Fix 4165: correctly fill in false_sel when performing comparison with constant null value
- 4160: node: don't crash on syntax errors
- 4154: Making date_trunc statistics handling consistent with date_part
- 4153: Support for int64 round trips in R driver using the bit64 package
- 4151: Fix orrify merge conflict
- 4143: Correctly handle query parameters in JDBC
- 4140: CI Fixes
- 4139: Remove redundant code
- 4138: Support struct.* to retrieve all struct fields in SELECT list
- 4134: Fuzzer Fixes
- 4133: Remove DUCKDB_API for deletes. (For Windows/ZIG)
- 4132: [Python] `project` now correctly inherits owning references to PyObjects
- 4131: Missing error messages
- 4125: Fix Orrify rename merge conflicts
- 4124: [Substrait] [Python] [R] Upgrade Substrait and introduce function to export query plan as a substrait - JSON
- 4117: (Hopefully) fix signing extension signing on master
- 4112: PARQUET: Add data pages encodings to their metadata
- 4111: Fix off-by-one in plan cost regression test script
- 4110: Rename Orrify -> ToUnifiedFormat, VectorData -> UnifiedVectorFormat, Normalify -> Flatten
- 4108: ODBC: fixing multicolumn parameter binding
- 4107: Refactor: rename simple aggregate to ungrouped aggregate
- 4104: Support Parquet's `RLE_DICTIONARY` encoding for string columns
- 4103: Ntile fixes
- 4101: Some follow up fixes for extension signing
- 4096: Implement ANALYZE
- 4093: Support ORDER BY and LIMIT in correlated subqueries, and add support for the ARRAY(subquery) syntax
- 4090: Fix for non varchar input for sequence functions
- 4088: Fix Issue 3813 - fixedsize PyArrow List -> DuckDB conversion
- 4083: JDBC Change getTimestamp to throw an error for wrong data types
- 4080: Several parser improvements
- 4076: Unentangle Parquet ColumnWriter and StandardColumnWriterState
- 4075: feat(breaking): improve python exceptions
- 4070: [JDBC] CachedRowSet support
- 4069: Improve error messages of extension install
- 4068: Fix bug with PhysicalStreamingWindow
- 4065: Better handling plus encoding in urls
- 4061: Fix 3991: use case_insensitive_map for headers
- 4060: Null handling unification
- 4059: Prepared Statement Verification & many prepared statement fixes
- 4058: nodejs: use less memory in each
- 4057: Fixed an error in comment
- 4053: [R] [CI] Run arrow test single threaded to avoid wrong fp comparison
- 4050: Bump sqlite scanner version
- 4049: Remove need for locks in TPC-H dbgen
- 4048: Test query profiler shouldn't output profiling info to the console
- 4045: Making delayload flags dependent on whether we are NOT doing a static…
- 4044: Issue 3593: avoid duplicate eliminating correlated columns in subqueries when they involve LIST columns
- 4039: Making memory leak sanitizer happy with DuckDB Shell
- 4035: Fix several memory-allocation related issues - use Allocator in many places, and reduce many allocations all over
- 4033: Plan cost regression tests
- 4032: Add missing python test dependencies
- 4031: Fix issue 3989
- 4012: Fix amalgamated build with multiple .cpp
- 4011: Fix amalgamation script when --splits is used
- 4009: `EXPLAIN ANALYSE` should honor profiler output format
- 4005: Fix for 3997
- 4002: fix fts/httpfs include directories
- 3999: Include guard renaming for amalgamation export
- 3996: Fix for issue 3951
- 3990: Substrait Interface in R API
- 3988: feat: implement DuckDBConnectiongetSchema for JDBC
- 3985: Pandas->DuckDB Series of dtype='O' conversion
- 3982: Expose dbgen text buffer size as a parameter and Python Replacement Scans Leak fix
- 3978: Enhance bound parameters error message
- 3977: Adding alias part 2
- 3973: Using aggregate input data for aggregate functions
- 3971: Issue 3079: When installed system RAM cannot be determined, default to no memory limit
- 3967: Use fmt library for Value::ToString of float/double types
- 3965: Fix 3942: avoid converting + to space in httplib::decode_url
- 3964: Add support for DATEFORMAT and TIMESTAMPFORMAT to COPY TO
- 3963: Atomic extension install: use UUID in temp file
- 3961: Fix 3960: avoid returning an error when a blob contains a NULL character in duckdb_append_blob
- 3958: Fix 3955: correctly compute width/scale when combining decimal type of different width/scale
- 3957: [Java] Implement appender support for all? UTF-8 characters 😜
- 3953: Fix missing LIST type in duckdb_types
- 3952: Windows FileExists regression fix: need to use _wstati64 instead of _wstat64i32
- 3950: Atomic extension installation
- 3945: Fuzzer 55: Remove Normalify Call
- 3939: Issue 3937: Casting infinite times
- 3928: Adding alias type struct and map
- 3927: Fix failing TPC-E test
- 3925: New Julia package requires 0.4 of DuckDB_jll
- 3921: Retire `LogicalTypeId::HASH` and replace it with `LogicalTypeId::UBIGINT`
- 3919: ODBC: SingleExecuteStmt and error message
- 3918: Julia compat version
- 3917: Ignore invalid UTF8 in fuzzer scripts
- 3916: Julia Guidelines fix
- 3915: Add duckdb_extensions function
- 3914: Expanding jdbc deploy script to be able to automatically release, too
- 3912: Julia UUID and version bump
- 3911: Making universal builds of OSX Extensions
- 3910: Fix for export of current_time, current_timestamp, etc functions
- 3909: More fuzzer fixes
- 3903: Issue 3881: DATE_TRUNC statistics
- 3900: Add newlines at EOF
- 3897: feat: add extension load/install methods to python client
- 3882: Uncompressed string improvements
- 3868: Bump yyjson version
- 3867: Enable exporting macro's
- 3866: Add default for function NULL handling
- 3864: [Python] Relation Explain
- 3853: Feature/struct_insert function
- 3814: Expose dbgen text buffer size as a parameter
- 3694: List lambdas
- 3618: Struct Types for Node.js UDFs
- 3600: Issue 1466: added `map_from_entries` function