Dagster

Latest version: v1.9.5

Safety actively analyzes 688634 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 19 of 51

1.1.21

New

- Further performance improvements for `build_asset_reconciliation_sensor`.
- Dagster now allows you to backfill asset selections that include mapped partition definitions, such as a daily asset which rolls up into a weekly asset, as long as the root assets in your selection share a partition definition.
- Dagit now includes information about the cause of an asset’s staleness.
- Improved the error message for non-matching cron schedules in `TimeWindowPartitionMapping`s with offsets. (Thanks Sean Han!)
- [dagster-aws] The EcsRunLauncher now allows you to configure the `runtimePlatform` field for the task definitions of the runs that it launches, allowing it to launch runs using Windows Docker images.
- [dagster-azure] Add support for DefaultAzureCredential for adls2_resource (Thanks Martin Picard!)
- [dagster-databricks] Added op factories to create ops for running existing Databricks jobs (`create_databricks_run_now_op`), as well as submitting one-off Databricks jobs (`create_databricks_submit_run_op`). See the [new Databricks guide](https://docs.dagster.io/master/integrations/databricks) for more details.
- [dagster-duckdb-polars] Added a dagster-duckdb-polars library that includes a `DuckDBPolarsTypeHandler` for use with `build_duckdb_io_manager`, which allows loading / storing Polars DataFrames from/to DuckDB. (Thanks Pezhman Zarabadi-Poor!)
- [dagster-gcp-pyspark] New PySpark TypeHandler for the BigQuery I/O manager. Store and load your PySpark DataFrames in BigQuery using `bigquery_pyspark_io_manager`.
- [dagster-snowflake] [dagster-duckdb] The Snowflake and DuckDB IO managers can now load multiple partitions in a single step - e.g. when a non-partitioned asset depends on a partitioned asset or a single partition of an asset depends on multiple partitions of an upstream asset. Loading occurs using a single SQL query and returns a single `DataFrame`.
- [dagster-k8s] The Helm chart now supports the full kubernetes env var spec for user code deployments. Example:

yaml
dagster-user-deployments:
deployments:
- name: my-code
env:
- name: FOO
valueFrom:
fieldFre:
fieldPath: metadata.uid


If `includeConfigInLaunchedRuns` is enabled, these env vars will also be applied to the containers for launched runs.

Bugfixes

- Previously, if an `AssetSelection` which matched no assets was passed into `define_asset_job`, the resulting job would target all assets in the repository. This has been fixed.
- Fixed a bug that caused the UI to show an error if you tried to preview a future schedule tick for a schedule built using `build_schedule_from_partitioned_job`.
- When a non-partitioned non-asset job has an input that comes from a partitioned SourceAsset, we now load all partitions of that asset.
- Updated the `fs_io_manager` to store multipartitioned materializations in directory levels by dimension. This resolves a bug on windows where multipartitioned materializations could not be stored with the `fs_io_manager`.
- Schedules and sensors previously timed out when attempting to yield many multipartitioned run requests. This has been fixed.
- Fixed a bug where `context.partition_key` would raise an error when executing on a partition range within a single run via Dagit.
- Fixed a bug that caused the default IO manager to incorrectly raise type errors in some situations with partitioned inputs.
- [ui] Fixed a bug where partition health would fail to display for certain time window partitions definitions with positive offsets.
- [ui] Always show the “Reload all” button on the code locations list page, to avoid an issue where the button was not available when adding a second location.
- [ui] Fixed a bug where users running multiple replicas of dagit would see repeated `Definitions reloaded` messages on fresh page loads.
- [ui] The asset graph now shows only the last path component of linked assets for better readability.
- [ui] The op metadata panel now longer capitalizes metadata keys
- [ui] The asset partitions page, asset sidebar and materialization dialog are significantly smoother when viewing assets with a large number of partitions (100k+)
- [dagster-gcp-pandas] The Pandas TypeHandler for BigQuery now respects user provided `location` information.
- [dagster-snowflake] `ProgrammingError` was imported from the wrong library, this has been fixed. Thanks herbert-allium!

Experimental

- You can now set an explicit logical version on `Output` objects rather than using Dagster’s auto-generated versions.
- New `get_asset_provenance` method on `OpExecutionContext` allows fetching logical version provenance for an arbitrary asset key.
- [ui] - you can now create dynamic partitions from the partition selection UI when materializing a dynamically partitioned asset

Documentation

- Added an example of how to use dynamic asset partitions - in the `examples/assets_dynamic_partitions` folder
- New [tutorial](https://docs.dagster.io/master/integrations/bigquery/using-bigquery-with-dagster) for using the BigQuery I/O manager.
- New [reference page](https://docs.dagster.io/master/integrations/bigquery/reference) for BigQuery I/O manager features.
- New [automating data pipelines guide](https://legacy-versioned-docs.dagster.dagster-docs.io/1.1.21/guides/dagster/automated_pipelines)

1.1.20

New

- The new `graph_asset` and `graph_multi_asset` decorators make it more ergonomic to define graph-backed assets.
- Dagster will auto-infer dependency relationships between single-dimensionally partitioned assets and multipartitioned assets, when the single-dimensional partitions definition is a dimension of the `MultiPartitionsDefinition`.
- A new `Test sensor` / `Test schedule` button that allows you to perform a dry-run of your sensor / schedule. Check out the docs on this functionality [here](https://docs.dagster.io/master/concepts/partitions-schedules-sensors/sensors#via-dagit) for sensors and [here](https://docs.dagster.io/master/concepts/partitions-schedules-sensors/sensors#via-dagit) for schedules.
- [dagit] Added (back) tag autocompletion in the runs filter, now with improved query performance.
- [dagit] The Dagster libraries and their versions that were used when loading definitions can now be viewed in the actions menu for each code location.
- New `bigquery_pandas_io_manager` can store and load Pandas dataframes in BigQuery.
- [dagster-snowflake, dagster-duckdb] SnowflakeIOManagers and DuckDBIOManagers can now default to loading inputs as a specified type if a type annotation does not exist for the input.
- [dagster-dbt] Added the ability to use the “state:” selector
- [dagster-k8s] The Helm chart now supports the full kubernetes env var spec for Dagit and the Daemon. E.g.

yaml
dagit:
env:
- name: “FOO”
valueFrom:
fieldRef:
fieldPath: metadata.uid


Bugfixes

- Previously, graphs would fail to resolve an input with a custom type and an input manager key. This has been fixed.
- Fixes a bug where negative partition counts were displayed in the asset graph.
- Previously, when an asset sensor did not yield run requests, it returned an empty result. This has been updated to yield a meaningful message.
- Fix an issue with a non-partitioned asset downstream of a partitioned asset with self-dependencies causing a GQL error in dagit.
- [dagster-snowflake-pyspark] Fixed a bug where the PySparkTypeHandler was incorrectly loading partitioned data.
- [dagster-k8s] Fixed an issue where [run monitoring](https://docs.dagster.io/deployment/run-monitoring#run-monitoring) sometimes failed to detect that the kubernetes job for a run had stopped, leaving the run hanging.

Documentation

- Updated contributor docs to reference our new toolchain (`ruff`, `pyright`).
- (experimental) Documentation for the dynamic partitions definition is now added.
- [dagster-snowflake] The Snowflake I/O Manager reference page now includes information on working with partitioned assets.

1.1.19

New

- The `FreshnessPolicy` object now supports a `cron_schedule_timezone` argument.
- `AssetsDefinition.from_graph` now supports a `freshness_policies_by_output_name` parameter.
- The `asset_sensor` will now display an informative `SkipReason` when no new materializations have been created since the last sensor tick.
- `AssetsDefinition` now has a `to_source_asset` method, which returns a representation of this asset as a `SourceAsset`.
- You can now designate assets as inputs to ops within a graph or graph-based job. E.g.

python
from dagster import asset, job, op

asset
def emails_to_send():
...

op
def send_emails(emails) -> None:
...

job
def send_emails_job():
send_emails(emails_to_send.to_source_asset())


- Added a `--dagit-host/-h` argument to the `dagster dev` command to allow customization of the host where Dagit runs.
- [dagster-snowflake, dagster-duckdb] Database I/O managers (Snowflake, DuckDB) now support static partitions, multi-partitions, and dynamic partitions.

Bugfixes

- Previously, if a description was provided for an op that backed a multi-asset, the op’s description would override the descriptions in Dagit for the individual assets. This has been fixed.
- Sometimes, when applying an `input_manager_key` to an asset’s input, incorrect resource config could be used when loading that input. This has been fixed.
- Previously, the backfill page errored when partitions definitions changed for assets that had been backfilled. This has been fixed.
- When displaying materialized partitions for multipartitioned assets, Dagit would error if a dimension had zero partitions. This has been fixed.
- [dagster-k8s] Fixed an issue where setting `runK8sConfig` in the Dagster Helm chart would not pass configuration through to pods launched using the `k8s_job_executor`.
- [dagster-k8s] Previously, using the `execute_k8s_job` op downstream of a dynamic output would result in k8s jobs with duplicate names being created. This has been fixed.
- [dagster-snowflake] Previously, if the schema for storing outputs didn’t exist, the Snowflake I/O manager would fail. Now it creates the schema.

Breaking Changes

- Removed the experimental, undocumented `asset_key`, `asset_partitions`, and `asset_partitions_defs` arguments on `Out`.
- `multi_asset` no longer accepts `Out` values in the dictionary passed to its `outs` argument. This was experimental and deprecated. Instead, use `AssetOut`.
- The experimental, undocumented `top_level_resources` argument to the `repository` decorator has been renamed to `_top_level_resources` to emphasize that it should not be set manually.

Community Contributions

- `load_asset_values` now accepts resource configuration (thanks Nintorac!)
- Previously, when using the `UPathIOManager`, paths with the `"."` character in them would be incorrectly truncated, which could result in multiple distinct objects being written to the same path. This has been fixed. (Thanks spenczar!)

Experimental

- [dagster-dbt] Added documentation to our dbt Cloud integration to cache the loading of software-defined assets from a dbt Cloud job.

Documentation

- Revamped the introduction to the Partitions concepts page to make it clear that non-time-window partitions are equally encouraged.
- In Navigation, moved the Partitions and Backfill concept pages to their own section underneath Concepts.
- Moved the Running Dagster locally guide from **Deployment** to **Guides** to reflect that OSS and Cloud users can follow it.
- Added [a new guide](https://docs.dagster.io/guides/dagster/asset-versioning-and-caching) covering asset versioning and caching.

1.1.18

New

- Assets with time-window `PartitionsDefinition`s (e.g. `HourlyPartitionsDefinition`, `DailyPartitionsDefinition`) may now have a `FreshnessPolicy`.
- [dagster-dbt] When using `load_assets_from_dbt_project` or `load_assets_from_dbt_manifest` with `dbt-core>=1.4`, `AssetMaterialization` events will be emitted as the dbt command executes, rather than waiting for dbt to complete before emitting events.
- [dagster-aws] When [run monitoring](https://docs.dagster.io/deployment/run-monitoring#run-monitoring) detects that a run unexpectedly crashed or failed to start, an error message in the run’s event log will include log messages from the ECS task for that run to help diagnose the cause of the failure.
- [dagster-airflow] added `make_ephemeral_airflow_db_resource` which returns a `ResourceDefinition` for a local only airflow database for use in migrated airflow DAGs
- Made some performance improvements for job run queries which can be applied by running `dagster instance migrate`.
- [dagit] System tags (code + logical versions) are now shown in the asset sidebar and on the asset details page.
- [dagit] Source assets that have never been observed are presented more clearly on the asset graph.
- [dagit] The number of materialized and missing partitions are shown on the asset graph and in the asset catalog for partitioned assets.
- [dagit] Databricks-backed assets are now shown on the asset graph with a small “Databricks” logo.

Bugfixes

- Fixed a bug where materializations of part of the asset graph did not construct required resource keys correctly.
- Fixed an issue where `observable_source_asset` incorrectly required its function to have a `context` argument.
- Fixed an issue with serialization of freshness policies, which affected cacheable assets that included these policies such as those from `dagster-airbyte`
- [dagster-dbt] Previously, the `dagster-dbt` integration was incompatible with `dbt-core>=1.4`. This has been fixed.
- [dagster-dbt] `load_assets_from_dbt_cloud_job` will now avoid unnecessarily generating docs when compiling a manifest for the job. Compile runs will no longer be kicked off for jobs not managed by this integration.
- Previously for multipartitioned assets, `context.asset_partition_key` returned a string instead of a `MultiPartitionKey`. This has been fixed.
- [dagster-k8s] Fixed an issue where pods launched by the `k8s_job_executor` would sometimes unexpectedly fail due to transient 401 errors in certain kubernetes clusters.
- Fix a bug with nth-weekday-of-the-month handling in cron schedules.

Breaking Changes

- [dagster-airflow] `load_assets_from_airflow_dag` no longer creates airflow db resource definitions, as a user you will need to provide them on `Definitions` directly

Deprecations

- The `partitions_fn` argument of the `DynamicPartitionsDefinition` class is now deprecated and will be removed in 2.0.0.

Community Contributions

- [dagster-wandb] A new integration with [Weights & Biases](https://wandb.ai/site) allows you to orchestrate your MLOps pipelines and maintain ML assets with Dagster.
- Postgres has been updated to 14.6 for Dagster’s helm chart. Thanks [DustyShap](https://github.com/DustyShap)!
- Typo fixed in docs. Thanks [C0DK](https://github.com/C0DK)!
- You can now pass a callable directly to `asset` (rather than using `asset` in decorator form) to create an asset. Thanks [ns-finkelstein](https://github.com/nsfinkelstein)!

Documentation

- New “Asset versioning and caching” guide
- [dagster-snowflake] The Snowflake guide has been updated to include PySpark dataframes
- [dagster-snowflake] The Snowflake guide has been updated to include private key authentication
- [dagster-airflow] The Airflow migration guide has been update to include more detailed instructions and considerations for making a migration

1.1.17

New

- The `dagster-airflow` library as been moved to 1.x.x to denote the stability of its api's going forward.
- [dagster-airflow] `make_schedules_and_jobs_from_airflow_dag_bag` has been added to allow for more fine grained composition of your transformed airflow DAGs into Dagster.
- [dagster-airflow] Airflow dag task `retries` and `retry_delay` configuration are now converted to op [RetryPolicies](https://docs.dagster.io/concepts/ops-jobs-graphs/op-retries#retrypolicy) with all `make_dagster_*` apis.

Bugfixes

- Fixed an issue where cron schedules using a form like `0 5 * * mon1` to execute on a certain day of the week each month executed every week instead.
- [dagit] Fixed an issue where the asset lineage page sometimes timed out while loading large asset graphs.
- Fixed an issue where the partitions page sometimes failed to load for partitioned asset jobs.

Breaking Changes

- [dagster-airflow] The `use_airflow_template_context`, `mock_xcom` and `use_ephemeral_airflow_db` params have been dropped, by default all `make_dagster_*` apis now use a run-scoped airflow db, similiar to how `use_ephemeral_airflow_db` worked.
- [dagster-airflow] `make_airflow_dag` has been removed.
- [dagster-airflow] `make_airflow_dag_for_operator` has been removed.
- [dagster-airflow] `make_airflow_dag_containerized` has been removed.
- [dagster-airflow] `airflow_operator_to_op` has been removed.
- [dagster-airflow] `make_dagster_repo_from_airflow_dags_path` has been removed.
- [dagster-airflow] `make_dagster_repo_from_airflow_dag_bag` has been removed.
- [dagster-airflow] `make_dagster_repo_from_airflow_example_dags` has been removed.
- [dagster-airflow] The naming convention for ops generated from airflow tasks has been changed to `${dag_id}__${task_id}` from `airflow_${task_id}_${unique_int}`.
- [dagster-airflow] The naming convention for jobs generated from airflow dags has been changed to `${dag_id}` from `airflow_${dag_id}`.

1.1.15

New

- Definitions now accepts Executor instances in its executor argument, not just ExecutorDefinitions.
- `multi_asset_sensor` now accepts a `request_assets` parameter, which allows it to directly request that assets be materialized, instead of requesting a run of a job.
- Improved the performance of instantiating a `Definitions` when using large numbers of assets or many asset jobs.
- The job passed to `build_schedule_from_partitioned_job` no longer needs to have a `partitions_def` directly assigned to it. Instead, Dagster will infer from the partitions from the assets it targets.
- `OpExecutionContext.asset_partition_keys_for_output` no longer requires an argument to specify the default output.
- The “Reload all” button on the Code Locations page in Dagit will now detect changes to a `pyproject.toml` file that were made while Dagit was running. Previously, Dagit needed to be restarted in order for such changes to be shown.
- `get_run_record_by_id` has been added to `DagsterInstance` to provide easier access to `RunRecord` objects which expose the `start_time` and `end_time` of the run.
- [dagit] In the “Materialize” modal, you can now choose to pass a range of asset partitions to a single run rather than launching a backfill.
- [dagster-docker] Added a `docker_container_op` op and `execute_docker_container_op` helper function for running ops that launch arbitrary Docker containers. See [the docs](https://docs.dagster.io/_apidocs/libraries/dagster-docker#ops) for more information.
- [dagster-snowflake-pyspark] The Snowflake I/O manager now supports PySpark DataFrames.
- [dagster-k8s] The Docker images include in the Dagster Helm chart are now built on the most recently released `python:3.x-slim` base image.

Bugfixes

- Previously, the `build_asset_reconciliation_sensor` could time out when evaluating ticks over large selections of assets, or assets with many partitions. A series of performance improvements should make this much less likely.
- Fixed a bug that caused a failure when using `run_request_for_partition` in a sensor that targeted multiple jobs created via `define_asset_job`.
- The cost of importing `dagster` has been reduced.
- Issues preventing “re-execute from failure” from working correctly with dynamic graphs have been fixed.
- [dagit] In Firefox, Dagit no longer truncates text unnecessarily in some cases.
- [dagit] Dagit’s asset graph now allows you to click “Materialize” without rendering the graph if you have too many assets to display.
- [dagit] Fixed a bug that stopped the backfill page from loading when assets that had previously been backfilled no longer had a `PartitionsDefinition`.
- [dagster-k8s] Fixed an issue where `k8s_job_op` raised an Exception when running pods with multiple containers.
- [dagster-airbyte] Loosened credentials masking for Airbyte managed ingestion, fixing the Hubspot source, thanks [joel-olazagasti](https://github.com/joel-olazagasti)!
- [dagster-airbyte] When using managed ingestion, Airbyte now pulls all source types available to the instance rather than the workspace, thanks [emilija-omnisend](https://github.com/emilija-omnisend)!
- [dagster-airbyte] Fixed an issue which arose when attaching freshness policies to Airbyte assets and using the multiprocessing executor.
- [dagster-fivetran] Added the ability to force assets to be output for all specified Fivetran tables during a sync in the case that a sync’s API outputs are missing one or more tables.

Breaking Changes

- The `asset_keys` and `asset_selection` parameters of the experimental `multi_asset_sensor` decorator have been replaced with a `monitored_assets` parameter. This helps disambiguate them from the new `request_assets` parameter.

Community Contributions

- A broken docs link in snowflake_quickstart has been fixed, thanks [clayheaton](https://github.com/clayheaton)!
- Troubleshooting help added to helm deployment guide, thanks adam-bloom!
- `StaticPartitionMapping` is now serializable, thanks [AlexanderVR](https://github.com/AlexanderVR)!
- [dagster-fivetran] `build_fivetran_assets` now supports `group_name` , thanks [toddy86](https://github.com/toddy86)!
- [dagster-azure] `AzureBlobComputeManager` now supports authentication via `DefaultAzureCredential`, thanks [mpicard](https://github.com/mpicard)!

Experimental

- [dagster-airflow] added a new api `load_assets_from_airflow_dag` that creates graph-backed, partitioned, assets based on the provided Airflow DAG.

Page 19 of 51

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.