Dagster

Latest version: v1.9.5

Safety actively analyzes 688634 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 48 of 51

0.7.5

Not secure
**New**

- Added the `IntSource` type, which lets integers be set from environment variables in config.
- You may now set tags on pipeline definitions. These will resolve in the following cases:

1. Loading in the playground view in Dagit will pre-populate the tag container.
2. Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence.
3. Executing from the CLI will generate runs with the pipeline tags.
4. Executing programmatically using the `execute_pipeline` api will create a run with the union
of pipeline tags and `RunConfig` tags, with `RunConfig` tags taking precedence.
5. Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.

- Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this.

- We now export the `SolidExecutionContext` in the public API so that users can correctly type hint
solid compute functions.

**Dagit**

- Pipeline run tags are now preserved when resuming/retrying from Dagit.
- Scheduled run stats are now grouped by partition.
- A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution.
- Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.

**Bugfix**

- Resume/retry now works as expected in the presence of solids that yield optional outputs.
- Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that were `None`.
- Fixed an issue with attempting to set `threads_per_worker` on Dask distributed clusters.

**dagster-postgres**

- All postgres config may now be set using environment variables in config.

**dagster-aws**

- The `s3_resource` now exposes a `list_objects_v2` method corresponding to the underlying boto3
API. (Thanks, basilvetas!)
- Added the `redshift_resource` to access Redshift databases.

**dagster-k8s**

- The `K8sRunLauncher` config now includes the `load_kubeconfig` and `kubeconfig_file` options.

**Documentation**

- Fixes and improvements.

**Dependencies**

- dagster-airflow no longer pins its werkzeug dependency.

**Community**

- We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.

We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.

If you'd like to opt in to telemetry, please add the following to `$DAGSTER_HOME/dagster.yaml`:

telemetry:
enabled: true

- Thanks to basilvetas and hspak for their contributions!

0.7.4

Not secure
**New**

- It is now possible to use Postgres to back schedule storage by configuring
`dagster_postgres.PostgresScheduleStorage` on the instance.
- Added the `execute_pipeline_with_mode` API to allow executing a pipeline in test with a specific
mode without having to specify `RunConfig`.
- Experimental support for retries in the Celery executor.
- It is now possible to set run-level priorities for backfills run using the Celery executor by
passing `--celery-base-priority` to `dagster pipeline backfill`.
- Added the `weekly` schedule decorator.

**Deprecations**

- The `dagster-ge` library has been removed from this release due to drift from the underlying
Great Expectations implementation.

**dagster-pandas**

- `PandasColumn` now includes an `is_optional` flag, replacing the previous
`ColumnExistsConstraint`.
- You can now pass the `ignore_missing_values flag` to `PandasColumn` in order to apply column
constraints only to the non-missing rows in a column.

**dagster-k8s**

- The Helm chart now includes provision for an Ingress and for multiple Celery queues.

**Documentation**

- Improvements and fixes.

0.7.3

Not secure
**New**

- It is now possible to configure a Dagit instance to disable executing pipeline runs in a local
subprocess.
- Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized.
- Support Redis queue provider in `dagster-k8s` Helm chart.
- Support external postgresql in `dagster-k8s` Helm chart.

**Bugfix**

- Fixed an issue with inaccurate timings on some resource initializations.
- Fixed an issue that could cause the multiprocess engine to spin forever.
- Fixed an issue with default value resolution when a config value was set using `SourceString`.
- Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
- Fixed an issue with where the CLI command `dagster schedule up` would fail in certain scenarios
with the `SystemCronScheduler`.

**Pandas**

- Column constraints can now be configured to permit NaN values.

**Dagstermill**

- Removed a spurious dependency on sklearn.

**Docs**

- Improvements and fixes to docs.
- Restored dagster.readthedocs.io.

**Experimental**

- An initial implementation of solid retries, throwing a `RetryRequested` exception, was added.
This API is experimental and likely to change.

**Other**

- Renamed property `runtime_type` to `dagster_type` in definitions. The following are deprecated
and will be removed in a future version.
- `InputDefinition.runtime_type` is deprecated. Use `InputDefinition.dagster_type` instead.
- `OutputDefinition.runtime_type` is deprecated. Use `OutputDefinition.dagster_type` instead.
- `CompositeSolidDefinition.all_runtime_types` is deprecated. Use
`CompositeSolidDefinition.all_dagster_types` instead.
- `SolidDefinition.all_runtime_types` is deprecated. Use `SolidDefinition.all_dagster_types`
instead.
- `PipelineDefinition.has_runtime_type` is deprecated. Use `PipelineDefinition.has_dagster_type`
instead.
- `PipelineDefinition.runtime_type_named` is deprecated. Use
`PipelineDefinition.dagster_type_named` instead.
- `PipelineDefinition.all_runtime_types` is deprecated. Use
`PipelineDefinition.all_dagster_types` instead.

0.7.2

Not secure
**Docs**

- New docs site at docs.dagster.io.
- dagster.readthedocs.io is currently stale due to availability issues.

**New**

- Improvements to S3 Resource. (Thanks dwallace0723!)
- Better error messages in Dagit.
- Better font/styling support in Dagit.
- Changed `OutputDefinition` to take `is_required` rather than `is_optional` argument. This is to
remain consistent with changes to `Field` in 0.7.1 and to avoid confusion
with python's typing and dagster's definition of `Optional`, which indicates None-ability,
rather than existence. `is_optional` is deprecated and will be removed in a future version.
- Added support for Flower in dagster-k8s.
- Added support for environment variable config in dagster-snowflake.

**Bugfixes**

- Improved performance in Dagit waterfall view.
- Fixed bug when executing solids downstream of a skipped solid.
- Improved navigation experience for pipelines in Dagit.
- Fixed for the dagster-aws CLI tool.
- Fixed issue starting Dagit without DAGSTER_HOME set on windows.
- Fixed pipeline subset execution in partition-based schedules.

0.7.1

Not secure
**Dagit**

- Dagit now looks up an available port on which to run when the default port is
not available. (Thanks rparrapy!)

**dagster_pandas**

- Hydration and materialization are now configurable on `dagster_pandas` dataframes.

**dagster_aws**

- The `s3_resource` no longer uses an unsigned session by default.

**Bugfixes**

- Type check messages are now displayed in Dagit.
- Failure metadata is now surfaced in Dagit.
- Dagit now correctly displays the execution time of steps that error.
- Error messages now appear correctly in console logging.
- GCS storage is now more robust to transient failures.
- Fixed an issue where some event logs could be duplicated in Dagit.
- Fixed an issue when reading config from an environment variable that wasn't set.
- Fixed an issue when loading a repository or pipeline from a file target on Windows.
- Fixed an issue where deleted runs could cause the scheduler page to crash in Dagit.

**Documentation**

- Expanded and improved docs and error messages.

0.7.0

Not secure
**Breaking Changes**

There are a substantial number of breaking changes in the 0.7.0 release.
Please see `070_MIGRATION.md` for instructions regarding migrating old code.

**_Scheduler_**

- The scheduler configuration has been moved from the `schedules` decorator to `DagsterInstance`.
Existing schedules that have been running are no longer compatible with current storage. To
migrate, remove the `scheduler` argument on all `schedules` decorators:

instead of:


schedules(scheduler=SystemCronScheduler)
def define_schedules():
...


Remove the `scheduler` argument:


schedules
def define_schedules():
...


Next, configure the scheduler on your instance by adding the following to
`$DAGSTER_HOME/dagster.yaml`:


scheduler:
module: dagster_cron.cron_scheduler
class: SystemCronScheduler


Finally, if you had any existing schedules running, delete the existing `$DAGSTER_HOME/schedules`
directory and run `dagster schedule wipe && dagster schedule up` to re-instatiate schedules in a
valid state.

- The `should_execute` and `environment_dict_fn` argument to `ScheduleDefinition` now have a
required first argument `context`, representing the `ScheduleExecutionContext`

**_Config System Changes_**

- In the config system, `Dict` has been renamed to `Shape`; `List` to `Array`; `Optional` to
`Noneable`; and `PermissiveDict` to `Permissive`. The motivation here is to clearly delineate
config use cases versus cases where you are using types as the inputs and outputs of solids as
well as python typing types (for mypy and friends). We believe this will be clearer to users in
addition to simplifying our own implementation and internal abstractions.

Our recommended fix is _not_ to use `Shape` and `Array`, but instead to use our new condensed
config specification API. This allow one to use bare dictionaries instead of `Shape`, lists with
one member instead of `Array`, bare types instead of `Field` with a single argument, and python
primitive types (`int`, `bool` etc) instead of the dagster equivalents. These result in
dramatically less verbose config specs in most cases.

So instead of


from dagster import Shape, Field, Int, Array, String
... code
config=Shape({ Dict prior to change
'some_int' : Field(Int),
'some_list: Field(Array[String]) List prior to change
})


one can instead write:


config={'some_int': int, 'some_list': [str]}


No imports and much simpler, cleaner syntax.

- `config_field` is no longer a valid argument on `solid`, `SolidDefinition`, `ExecutorDefintion`,
`executor`, `LoggerDefinition`, `logger`, `ResourceDefinition`, `resource`, `system_storage`, and
`SystemStorageDefinition`. Use `config` instead.
- For composite solids, the `config_fn` no longer takes a `ConfigMappingContext`, and the context
has been deleted. To upgrade, remove the first argument to `config_fn`.

So instead of


composite_solid(config={}, config_fn=lambda context, config: {})


one must instead write:


composite_solid(config={}, config_fn=lambda config: {})


- `Field` takes a `is_required` rather than a `is_optional` argument. This is to avoid confusion
with python's typing and dagster's definition of `Optional`, which indicates None-ability,
rather than existence. `is_optional` is deprecated and will be removed in a future version.

**_Required Resources_**

- All solids, types, and config functions that use a resource must explicitly list that
resource using the argument `required_resource_keys`. This is to enable efficient
resource management during pipeline execution, especially in a multiprocessing or
remote execution environment.

- The `system_storage` decorator now requires argument `required_resource_keys`, which was
previously optional.

**_Dagster Type System Changes_**

- `dagster.Set` and `dagster.Tuple` can no longer be used within the config system.
- Dagster types are now instances of `DagsterType`, rather than a class than inherits from
`RuntimeType`. Instead of dynamically generating a class to create a custom runtime type, just
create an instance of a `DagsterType`. The type checking function is now an argument to the
`DagsterType`, rather than an abstract method that has to be implemented in
a subclass.
- `RuntimeType` has been renamed to `DagsterType` is now an encouraged API for type creation.
- Core type check function of DagsterType can now return a naked `bool` in addition
to a `TypeCheck` object.
- `type_check_fn` on `DagsterType` (formerly `type_check` and `RuntimeType`, respectively) now
takes a first argument `context` of type `TypeCheckContext` in addition to the second argument of
`value`.
- `define_python_dagster_type` has been eliminated in favor of `PythonObjectDagsterType` .
- `dagster_type` has been renamed to `usable_as_dagster_type`.
- `as_dagster_type` has been removed and similar capabilities added as
`make_python_type_usable_as_dagster_type`.
- `PythonObjectDagsterType` and `usable_as_dagster_type` no longer take a `type_check` argument. If
a custom type_check is needed, use `DagsterType`.
- As a consequence of these changes, if you were previously using `dagster_pyspark` or
`dagster_pandas` and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type
annotations to functions decorated with `solid` to indicate that they are input or output types
for a solid, you will need to call `make_python_type_usable_as_dagster_type` from your code in
order to map the Python types to the Dagster types, or just use the Dagster types
(`dagster_pandas.DataFrame` instead of `pandas.DataFrame`) directly.

**_Other_**

- We no longer publish base Docker images. Please see the updated deployment docs for an example
Dockerfile off of which you can work.
- `step_metadata_fn` has been removed from `SolidDefinition` & `solid`.
- `SolidDefinition` & `solid` now takes `tags` and enforces that values are strings or
are safely encoded as JSON. `metadata` is deprecated and will be removed in a future version.
- `resource_mapper_fn` has been removed from `SolidInvocation`.

**New**

- Dagit now includes a much richer execution view, with a Gantt-style visualization of step
execution and a live timeline.
- Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries
are now tested against 3.8. Note that several of our upstream dependencies have yet to publish
wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some
dependencies from source.
- `dagster/priority` tags can now be used to prioritize the order of execution for the built-in
in-process and multiprocess engines.
- `dagster-postgres` storages can now be configured with separate arguments and environment
variables, such as:


run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
username: test
password:
env: ENV_VAR_FOR_PG_PASSWORD
hostname: localhost
db_name: test


- Support for `RunLauncher`s on `DagsterInstance` allows for execution to be "launched" outside of
the Dagit/Dagster process. As one example, this is used by `dagster-k8s` to submit pipeline
execution as a Kubernetes Job.
- Added support for adding tags to runs initiated from the `Playground` view in Dagit.
- Added `monthly_schedule` decorator.
- Added `Enum.from_python_enum` helper to wrap Python enums for config. (Thanks kdungs!)
- **[dagster-bash]** The Dagster bash solid factory now passes along `kwargs` to the underlying
solid construction, and now has a single `Nothing` input by default to make it easier to create a
sequencing dependency. Also, logs are now buffered by default to make execution less noisy.
- **[dagster-aws]** We've improved our EMR support substantially in this release. The
`dagster_aws.emr` library now provides an `EmrJobRunner` with various utilities for creating EMR
clusters, submitting jobs, and waiting for jobs/logs. We also now provide a
`emr_pyspark_resource`, which together with the new `pyspark_solid` decorator makes moving
pyspark execution from your laptop to EMR as simple as changing modes.
**[dagster-pandas]** Added `create_dagster_pandas_dataframe_type`, `PandasColumn`, and
`Constraint` API's in order for users to create custom types which perform column validation,
dataframe validation, summary statistics emission, and dataframe serialization/deserialization.
- **[dagster-gcp]** GCS is now supported for system storage, as well as being supported with the
Dask executor. (Thanks habibutsu!) Bigquery solids have also been updated to support the new API.

**Bugfix**

- Ensured that all implementations of `RunStorage` clean up pipeline run tags when a run
is deleted. Requires a storage migration, using `dagster instance migrate`.
- The multiprocess and Celery engines now handle solid subsets correctly.
- The multiprocess and Celery engines will now correctly emit skip events for steps downstream of
failures and other skips.
- The `solid` and `lambda_solid` decorators now correctly wrap their decorated functions, in the
sense of `functools.wraps`.
- Performance improvements in Dagit when working with runs with large configurations.
- The Helm chart in `dagster_k8s` has been hardened against various failure modes and is now
compatible with Helm 2.
- SQLite run and event log storages are more robust to concurrent use.
- Improvements to error messages and to handling of user code errors in input hydration and output
materialization logic.
- Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow
pipelines.
- We now handle our SQLAlchemy connections in a more canonical way (thanks zzztimbo!).
- Fixed an issue using S3 system storage with certain custom serialization strategies.
- Fixed an issue leaking orphan processes from compute logging.
- Fixed an issue leaking semaphores from Dagit.
- Setting the `raise_error` flag in `execute_pipeline` now actually raises user exceptions instead
of a wrapper type.

**Documentation**

- Our docs have been reorganized and expanded (thanks habibutsu, vatervonacht, zzztimbo). We'd
love feedback and contributions!

**Thank you**
Thank you to all of the community contributors to this release!! In alphabetical order: habibutsu,
kdungs, vatervonacht, zzztimbo.

Page 48 of 51

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.