Pydiverse-pipedag

Latest version: v0.9.10

Safety actively analyzes 724087 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 6

0.6.1

- Create initial documentation for pipedag.
- Remove stage argument from [](RawSql) initializer.
- Add [](RawSql) to public API.
- Fix [](PrefectTwoEngine) failing on retrieval of results.
- Added [](Flow.get_stage()), and [](Stage.get_task()) methods.
- Added [](MaterializingTask.get_output_from_store()) method to allow retrieval of task output without running the Flow.
- Created [TableReference](ExternalTableReference) to simplify complex table loading operations.
- Allow for easy retrieval of tables generated by [](RawSql).
Passing a RawSql object into a task results in all tables that were generated by the RawSql to be dematerialized.
The tables can then be accessed using `raw_sql["table_name"]`.
Alternatively, the same syntax can also be used during flow definition to only pass in a specific table.
- Fix private method `SQLTableStore.get_stage_hash` not working for IBM DB2.

0.6.0

- Added [`delete-schemas`](reference/cli:delete-schemas) command to `pipedag-manage` to help with cleaning up database
- Remove all support for mssql database swapping.
Instead, we now properly support schema swapping.
- Fix UNLOGGED tables not working with Postgres.
- Added `hook_args` section to `table_store` part of config file to support passing config arguments to table hooks.
- Added `dtype_backend` hook argument for `PandasTableHook` to overriding the default pandas dtype backend to use.
- Update raw sql metadata table (`SQLTableStore`).
- Remove `engine_dispatch` and replace with SQLTableStore subclasses.
- Moved local table cache from `pydiverse.pipedag.backend.table_cache` to `pydiverse.pipedag.backend.table.cache` namespace.
- Changed order in which flow / instance config gets resolved.

0.5.0

- add support for DuckDB
- add support for pyarrow backed pandas dataframes
- support execution of subflow
- store final state of task in flow result object
- tasks now have a `position_hash` associated with them to identify them purely based on their position (e.g. stage, name and input wiring) inside a flow.
- breaking change to metadata: added position_hash to `tasks` metadata table and change type of hash columns from String(32) to String(20).
- `Flow`, `Subflow`, and `Result` objects now provide additional options for visualizing them
- added `unlogged_tables` flag to SQLTableStore for creating UNLOGGED tables with Postgres.
- created [`pipedag-manage`](reference/cli) command line utility with [`clear-metadata`](reference/cli:clear-metadata) command to help with migrating between different pipedag metadata versions.

0.4.1

- implement [](DaskEngine): orchestration engine for running multiple tasks in parallel
- implement [](DatabaseLockManager): lock manager based on locking mechanism provided by database

0.4.0

- update public interface
- encrypt IPC communication
- remove preemptive `os.makedirs` from ParquetTableCache
- improve logging and provide structlog utilities

0.3.0

- breaking change to pipedag.yaml:
introduced `args` subsections for arguments that are passed to backend classes
- fix ibm_db_sa bug when copying dataframes from cache: uppercase table names by default
- nicer readable SQL queries: use automatic aliases for inputs of SQLAlchemy tasks
- implement option ignore_task_version: disable eager task caching for some instances to reduce overhead from task version bumping
- implement local table cache: store input/output of dataframe tasks in parquet files and allow using it as cache to avoid rereading from database

Page 5 of 6

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.