Pydiverse-pipedag

Latest version: v0.9.9

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

0.9.3

- Added `upload_table()` and `download_table()` functions to the PandasTableHook to allow for easy
customization of up and download behavior of pandas and polars tables from/to the table store.
- More robust way of looking up hooks independent of import order. Subclasses of table stores don't
copy registered hooks in the moment of declaration. When registering a hook it is possible now, to
specify the hooks that are replaced by a new registration.

0.9.2

- input_stage_versions decorator allows specifying tasks which compare tables within the current stage transaction
schema and another version of that stage. This can be the currently active stage schema of the same pipeline
instance or from another instance. See: https://pydiversepipedag.readthedocs.io/en/latest/examples.html

0.9.1

- Support Snowflake as a backend for `SQLTableStore`.
- For mssql backend, moved primary key adding after filling complete table.
- Make polars dematerialization robust against missing connectorx. Fall back to pandas if connectorx is not available.
- Fix some bugs with pandas < 2 and sqlalchemy < 2 compatibility as well as pyarrow handling.
- Use pd.StringDtype("pyarrow") instead of pd.ArrowDtype(pa.string()) for dtype "string[pyarrow]"

0.9.0

- Support imperative materialization with `tbl_ref = dag.Table(...).materialize()`. This is particularly useful for
materializing subqueries within a task. It also helps see task in stack trace when materialization fails. There is
one downside of using it: when a task returns multiple tables, it is assumed that all tables depend on previously
imperatively materialized tables.
- Support group nodes with or without barrier effect on task ordering. They either be added by `with GroupNode():`
blocks around or within `with Stage():` blocks. Or they can be added in configuration via
`visualization: default: group_nodes: group_name: {label: "some_str", tasks: ["task_name"], stages: ["stage_name"]}`.
Visualization of group nodes can be controlled very flexibly with hide_box, hide_content, box_color_always, ...
- ExternalTableReference moved module and is now also a member of pydiverse.pipedag module. This is a breaking
interface change for pipedag.
- PrefectEngine moved to module pydiverse.pipedag.engine.prefect.PrefectEngine because it would otherwise import prefect
whenever it is installed in environment which messes with logging library initialization. This is a breaking
interface change.
- Fixed an edgecase for mssql backend causing queries with columns named "from" to crash. The code to insert an INTO into
mssql SELECT statements is still hacky but supports open quote detection. Comments may still confuse the logic.

0.8.0

- Significant refactoring of materialization is included. It splits creation of table from filling a table in many cases.
This may lead to unexpected changes in log output. For now, the `INSERT INTO SELECT` statement is only printed in
shortened version, because the creation of the table already includes the same statement in full. In the future, this
might be made configurable, so your feedback is highly welcome.
- pipedag.Table() now supports new parameters `nullable` and `non_nullable`. This allows specifying which columns are
nullable both as a positive and negative list. If both are specified, they must mention each column in the table and
have no overlap. For most dialects, non-nullable statements are issued after creating the empty table. For dialects
`mssql` and `ibm_db2`, both nullable and non-nullable column alterations are issued because constant literals create
non-nullable columns by default. If neither nullable nor non_nullable are specified, the default `CREATE TABLE as SELECT`
is kept unmodified except for primary key columns where some dialects require explicit `NOT NULL` statements.
- Refactored configuration for cache validation options. Now, there is a separate section called cache_validation configurable
per instance which includes the following options:
* mode: NORMAL, ASSERT_NO_FRESH_INPUT (protect a stable pipeline / fail if tasks with cache function are executed),
IGNORE_FRESH_INPUT (same as ignore_cache_function=True before),
FORCE_FRESH_INPUT (invalidates all tasks with cache function), FORCE_CACHE_INVALID (rerun all tasks)
* disable_cache_function: True disables the call of cache functions. Downside: next mode=NORMAL run will be cache invalid.
* ignore_task_version: Option existed before but a level higher
* REMOVED option ignore_cache_function: Use `cache_validation: mode: IGNORE_FRESH_INPUT` in pipedag.yaml or
`flow.run(cache_validation_mode=CacheValidationMode.IGNORE_FRESH_INPUT)` instead.
- Set transaction isolation level to READ UNCOMMITTED via SQLAlchemy functionality
- Fix that unlogged tables were created as logged tables when they were copied as cache valid
- Materialize lazy tasks, when they are executed without stage context.

0.7.2

- Disable Kroki links by default. New setting disable_kroki=True allows to still default kroki_url to https://kroki.io.
Function create_basic_pipedag_config() just has a kroki_url parameter which defaults to None.
- Added max_query_print_length parameter to MSSqlTableStore to limit the length of the printed SQL queries.
Default is max_query_print_length=500000 characters.
- Fix bug when creating a table with the same name as a `Table` given by `ExternalTableReference` in the same stage
- New config options for `SQLTableStore`:
* `max_concurrent_copy_operations` to limit the number of concurrent copy operations when copying tables between schemas.
* `sqlalchemy_pool_size` and `sqlalchemy_pool_timeout` to configure the pool size and timeout for the SQLAlchemy connection pool.
* The defaults fix a bug by setting sqlalchemy options to not time out when the first cache invalid task in a stage triggers
copying of cache valid tables between schemas and copying takes longer than 30s.

Page 2 of 6

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.