Kedro

Latest version: v0.19.11

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Page 19 of 23

0.17.4

Not secure

Major features and improvements
* Added the following new datasets:

| Type | Description | Location |
| ---------------------- | ----------------------------------------------------------- | ------------------------------ |
| `plotly.PlotlyDataSet` | Works with plotly graph object Figures (saves as json file) | `kedro.extras.datasets.plotly` |

Bug fixes and other changes
* Defined our set of Kedro Principles! Have a read through [our docs](https://docs.kedro.org/en/0.17.4/12_faq/03_kedro_principles.html).
* `ConfigLoader.get()` now raises a `BadConfigException`, with a more helpful error message, if a configuration file cannot be loaded (for instance due to wrong syntax or poor formatting).
* `run_id` now defaults to `save_version` when `after_catalog_created` is called, similarly to what happens during a `kedro run`.
* Fixed a bug where `kedro ipython` and `kedro jupyter notebook` didn't work if the `PYTHONPATH` was already set.
* Update the IPython extension to allow passing `env` and `extra_params` to `reload_kedro` similar to how the IPython script works.
* `kedro info` now outputs if a plugin has any `hooks` or `cli_hooks` implemented.
* `PartitionedDataSet` now supports lazily materializing data on save.
* `kedro pipeline describe` now defaults to the `__default__` pipeline when no pipeline name is provided and also shows the namespace the nodes belong to.
* Fixed an issue where spark.SparkDataSet with enabled versioning would throw a VersionNotFoundError when using databricks-connect from a remote machine and saving to dbfs filesystem.
* `EmailMessageDataSet` added to doctree.
* When node inputs do not pass validation, the error message is now shown as the most recent exception in the traceback ([Issue 761](https://github.com/kedro-org/kedro/issues/761)).
* `kedro pipeline package` now only packages the parameter file that exactly matches the pipeline name specified and the parameter files in a directory with the pipeline name.
* Extended support to newer versions of third-party dependencies ([Issue 735](https://github.com/kedro-org/kedro/issues/735)).
* Ensured consistent references to `model input` tables in accordance with our Data Engineering convention.
* Changed behaviour where `kedro pipeline package` takes the pipeline package version, rather than the kedro package version. If the pipeline package version is not present, then the package version is used.
* Launched [GitHub Discussions](https://github.com/kedro-org/kedro/discussions/) and [Kedro Discord Server](https://discord.gg/akJDeVaxnB)
* Improved error message when versioning is enabled for a dataset previously saved as non-versioned ([Issue 625](https://github.com/kedro-org/kedro/issues/625)).

Minor breaking changes to the API

0.17.3

Not secure

Major features and improvements
* Kedro plugins can now override built-in CLI commands.
* Added a `before_command_run` hook for plugins to add extra behaviour before Kedro CLI commands run.
* `pipelines` from `pipeline_registry.py` and `register_pipeline` hooks are now loaded lazily when they are first accessed, not on startup:

python
from kedro.framework.project import pipelines

print(pipelines["__default__"]) pipeline loading is only triggered here

Bug fixes and other changes
* `TemplatedConfigLoader` now correctly inserts default values when no globals are supplied.
* Fixed a bug where the `KEDRO_ENV` environment variable had no effect on instantiating the `context` variable in an iPython session or a Jupyter notebook.
* Plugins with empty CLI groups are no longer displayed in the Kedro CLI help screen.
* Duplicate commands will no longer appear twice in the Kedro CLI help screen.
* CLI commands from sources with the same name will show under one list in the help screen.
* The setup of a Kedro project, including adding src to path and configuring settings, is now handled via the `bootstrap_project` method.
* `configure_project` is invoked if a `package_name` is supplied to `KedroSession.create`. This is added for backward-compatibility purpose to support a workflow that creates `Session` manually. It will be removed in `0.18.0`.
* Stopped swallowing up all `ModuleNotFoundError` if `register_pipelines` not found, so that a more helpful error message will appear when a dependency is missing, e.g. [Issue 722](https://github.com/kedro-org/kedro/issues/722).
* When `kedro new` is invoked using a configuration yaml file, `output_dir` is no longer a required key; by default the current working directory will be used.
* When `kedro new` is invoked using a configuration yaml file, the appropriate `prompts.yml` file is now used for validating the provided configuration. Previously, validation was always performed against the kedro project template `prompts.yml` file.
* When a relative path to a starter template is provided, `kedro new` now generates user prompts to obtain configuration rather than supplying empty configuration.
* Fixed error when using starters on Windows with Python 3.7 (Issue [722](https://github.com/kedro-org/kedro/issues/722)).
* Fixed decoding error of config files that contain accented characters by opening them for reading in UTF-8.
* Fixed an issue where `after_dataset_loaded` run would finish before a dataset is actually loaded when using `--async` flag.

0.17.2

Not secure

Major features and improvements
* Added support for `compress_pickle` backend to `PickleDataSet`.
* Enabled loading pipelines without creating a `KedroContext` instance:

python
from kedro.framework.project import pipelines

print(pipelines)

* Projects generated with kedro>=0.17.2:
- should define pipelines in `pipeline_registry.py` rather than `hooks.py`.
- when run as a package, will behave the same as `kedro run`

Bug fixes and other changes
* If `settings.py` is not importable, the errors will be surfaced earlier in the process, rather than at runtime.

Minor breaking changes to the API
* `kedro pipeline list` and `kedro pipeline describe` no longer accept redundant `--env` parameter.
* `from kedro.framework.cli.cli import cli` no longer includes the `new` and `starter` commands.

0.17.1

Not secure

Major features and improvements
* Added `env` and `extra_params` to `reload_kedro()` line magic.
* Extended the `pipeline()` API to allow strings and sets of strings as `inputs` and `outputs`, to specify when a dataset name remains the same (not namespaced).
* Added the ability to add custom prompts with regexp validator for starters by repurposing `default_config.yml` as `prompts.yml`.
* Added the `env` and `extra_params` arguments to `register_config_loader` hook.
* Refactored the way `settings` are loaded. You will now be able to run:

python
from kedro.framework.project import settings

print(settings.CONF_ROOT)

* Added a check on `kedro.runner.parallel_runner.ParallelRunner` which checks datasets for the `_SINGLE_PROCESS` attribute in the `_validate_catalog` method. If this attribute is set to `True` in an instance of a dataset (e.g. `SparkDataSet`), the `ParallelRunner` will raise an `AttributeError`.
* Any user-defined dataset that should not be used with `ParallelRunner` may now have the `_SINGLE_PROCESS` attribute set to `True`.

Bug fixes and other changes
* The version of a packaged modular pipeline now defaults to the version of the project package.
* Added fix to prevent new lines being added to pandas CSV datasets.
* Fixed issue with loading a versioned `SparkDataSet` in the interactive workflow.
* Kedro CLI now checks `pyproject.toml` for a `tool.kedro` section before treating the project as a Kedro project.
* Added fix to `DataCatalog::shallow_copy` now it should copy layers.
* `kedro pipeline pull` now uses `pip download` for protocols that are not supported by `fsspec`.
* Cleaned up documentation to fix broken links and rewrite permanently redirected ones.
* Added a `jsonschema` schema definition for the Kedro 0.17 catalog.
* `kedro install` now waits on Windows until all the requirements are installed.
* Exposed `--to-outputs` option in the CLI, throughout the codebase, and as part of hooks specifications.
* Fixed a bug where `ParquetDataSet` wasn't creating parent directories on the fly.
* Updated documentation.

Breaking changes to the API
* This release has broken the `kedro ipython` and `kedro jupyter` workflows. To fix this, follow the instructions in the migration guide below.
* You will also need to upgrade `kedro-viz` to 3.10.1 if you use the `%run_viz` line magic in Jupyter Notebook.

> *Note:* If you're using the `ipython` [extension](https://docs.kedro.org/en/0.17.1/11_tools_integration/02_ipython.html#ipython-extension) instead, you will not encounter this problem.

Migration guide
You will have to update the file `<your_project>/.ipython/profile_default/startup/00-kedro-init.py` in order to make `kedro ipython` and/or `kedro jupyter` work. Add the following line before the `KedroSession` is created:

python
configure_project(metadata.package_name) to add

session = KedroSession.create(metadata.package_name, path)

Make sure that the associated import is provided in the same place as others in the file:

python
from kedro.framework.project import configure_project to add
from kedro.framework.session import KedroSession

Thanks for supporting contributions
[Mariana Silva](https://github.com/marianansilva),
[Kiyohito Kunii](https://github.com/921kiyo),
[noklam](https://github.com/noklam),
[Ivan Doroshenko](https://github.com/imdoroshenko),
[Zain Patel](https://github.com/mzjp2),
[Deepyaman Datta](https://github.com/deepyaman),
[Sam Hiscox](https://github.com/samhiscoxqb),
[Pascal Brokmeier](https://github.com/pascalwhoop)

0.17.0

Not secure

Major features and improvements

* In a significant change, [we have introduced `KedroSession`](https://docs.kedro.org/en/0.17.0/04_kedro_project_setup/03_session.html) which is responsible for managing the lifecycle of a Kedro run.
* Created a new Kedro Starter: `kedro new --starter=mini-kedro`. It is possible to [use the DataCatalog as a standalone component](https://github.com/kedro-org/kedro-starters/tree/master/mini-kedro) in a Jupyter notebook and transition into the rest of the Kedro framework.
* Added `DatasetSpecs` with Hooks to run before and after datasets are loaded from/saved to the catalog.
* Added a command: `kedro catalog create`. For a registered pipeline, it creates a `<conf_root>/<env>/catalog/<pipeline_name>.yml` configuration file with `MemoryDataSet` datasets for each dataset that is missing from `DataCatalog`.
* Added `settings.py` and `pyproject.toml` (to replace `.kedro.yml`) for project configuration, in line with Python best practice.
* `ProjectContext` is no longer needed, unless for very complex customisations. `KedroContext`, `ProjectHooks` and `settings.py` together implement sensible default behaviour. As a result `context_path` is also now an _optional_ key in `pyproject.toml`.
* Removed `ProjectContext` from `src/<package_name>/run.py`.
* `TemplatedConfigLoader` now supports [Jinja2 template syntax](https://jinja.palletsprojects.com/en/2.11.x/templates/) alongside its original syntax.
* Made [registration Hooks](https://docs.kedro.org/en/0.17.0/07_extend_kedro/02_hooks.html#registration-hooks) mandatory, as the only way to customise the `ConfigLoader` or the `DataCatalog` used in a project. If no such Hook is provided in `src/<package_name>/hooks.py`, a `KedroContextError` is raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.

Bug fixes and other changes

* `ParallelRunner` no longer results in a run failure, when triggered from a notebook, if the run is started using `KedroSession` (`session.run()`).
* `before_node_run` can now overwrite node inputs by returning a dictionary with the corresponding updates.
* Added minimal, black-compatible flake8 configuration to the project template.
* Moved `isort` and `pytest` configuration from `<project_root>/setup.cfg` to `<project_root>/pyproject.toml`.
* Extra parameters are no longer incorrectly passed from `KedroSession` to `KedroContext`.
* Relaxed `pyspark` requirements to allow for installation of `pyspark` 3.0.
* Added a `--fs-args` option to the `kedro pipeline pull` command to specify configuration options for the `fsspec` filesystem arguments used when pulling modular pipelines from non-PyPI locations.
* Bumped maximum required `fsspec` version to 0.9.
* Bumped maximum supported `s3fs` version to 0.5 (`S3FileSystem` interface has changed since 0.4.1 version).

Deprecations
* In Kedro 0.17.0 we have deleted the deprecated `kedro.cli` and `kedro.context` modules in favour of `kedro.framework.cli` and `kedro.framework.context` respectively.

Other breaking changes to the API
* `kedro.io.DataCatalog.exists()` returns `False` when the dataset does not exist, as opposed to raising an exception.
* The pipeline-specific `catalog.yml` file is no longer automatically created for modular pipelines when running `kedro pipeline create`. Use `kedro catalog create` to replace this functionality.
* Removed `include_examples` prompt from `kedro new`. To generate boilerplate example code, you should use a Kedro starter.
* Changed the `--verbose` flag from a global command to a project-specific command flag (e.g `kedro --verbose new` becomes `kedro new --verbose`).
* Dropped support of the `dataset_credentials` key in credentials in `PartitionedDataSet`.
* `get_source_dir()` was removed from `kedro/framework/cli/utils.py`.
* Dropped support of `get_config`, `create_catalog`, `create_pipeline`, `template_version`, `project_name` and `project_path` keys by `get_project_context()` function (`kedro/framework/cli/cli.py`).
* `kedro new --starter` now defaults to fetching the starter template matching the installed Kedro version.
* Renamed `kedro_cli.py` to `cli.py` and moved it inside the Python package (`src/<package_name>/`), for a better packaging and deployment experience.
* Removed `.kedro.yml` from the project template and replaced it with `pyproject.toml`.
* Removed `KEDRO_CONFIGS` constant (previously residing in `kedro.framework.context.context`).
* Modified `kedro pipeline create` CLI command to add a boilerplate parameter config file in `conf/<env>/parameters/<pipeline_name>.yml` instead of `conf/<env>/pipelines/<pipeline_name>/parameters.yml`. CLI commands `kedro pipeline delete` / `package` / `pull` were updated accordingly.
* Removed `get_static_project_data` from `kedro.framework.context`.
* Removed `KedroContext.static_data`.
* The `KedroContext` constructor now takes `package_name` as first argument.
* Replaced `context` property on `KedroSession` with `load_context()` method.
* Renamed `_push_session` and `_pop_session` in `kedro.framework.session.session` to `_activate_session` and `_deactivate_session` respectively.
* Custom context class is set via `CONTEXT_CLASS` variable in `src/<your_project>/settings.py`.
* Removed `KedroContext.hooks` attribute. Instead, hooks should be registered in `src/<your_project>/settings.py` under the `HOOKS` key.
* Restricted names given to nodes to match the regex pattern `[\w\.-]+$`.
* Removed `KedroContext._create_config_loader()` and `KedroContext._create_data_catalog()`. They have been replaced by registration hooks, namely `register_config_loader()` and `register_catalog()` (see also [upcoming deprecations](upcoming_deprecations_for_kedro_0.18.0)).

0.16.6

Not secure

Major features and improvements

* Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks.
* Added [kedro-starter-spaceflights](https://github.com/kedro-org/kedro-starter-spaceflights/) alias for generating a project: `kedro new --starter spaceflights`.

Bug fixes and other changes
* Fixed `TypeError` when converting dict inputs to a node made from a wrapped `partial` function.
* `PartitionedDataSet` improvements:
- Supported passing arguments to the underlying filesystem.
* Improved handling of non-ASCII word characters in dataset names.
- For example, a dataset named `jalapeño` will be accessible as `DataCatalog.datasets.jalapeño` rather than `DataCatalog.datasets.jalape__o`.
* Fixed `kedro install` for an Anaconda environment defined in `environment.yml`.
* Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update `.kedro.yml` to use `kedro lint` and `kedro jupyter notebook convert`.
* Improved documentation.
* Added documentation using MinIO with Kedro.
* Improved error messages for incorrect parameters passed into a node.
* Fixed issue with saving a `TensorFlowModelDataset` in the HDF5 format with versioning enabled.
* Added missing `run_result` argument in `after_pipeline_run` Hooks spec.
* Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in [this PR](https://github.com/kedro-org/kedro-starter-pandas-iris/pull/16) to your `00-kedro-init.py` file.
* Improved documentation.

Breaking changes to the API

Thanks for supporting contributions
[Deepyaman Datta](https://github.com/deepyaman), [Bhavya Merchant](https://github.com/bnmerchant), [Lovkush Agarwal](https://github.com/Lovkush-A), [Varun Krishna S](https://github.com/vhawk19), [Sebastian Bertoli](https://github.com/sebastianbertoli), [noklam](https://github.com/noklam), [Daniel Petti](https://github.com/djpetti), [Waylon Walker](https://github.com/waylonwalker), [Saran Balaji C](https://github.com/csaranbalaji)

Page 19 of 23

Releases

Has known vulnerabilities

Previous Next

Kedro

Page 19 of 23

0.17.4

0.17.3

0.17.2

0.17.1

0.17.0

0.16.6

Page 19 of 23

Links

Releases