Kedro

Latest version: v0.19.11

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Page 20 of 23

0.16.5

Not secure

Major features and improvements
* Added the following new datasets.

| Type | Description | Location |
| --------------------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------- |
| `email.EmailMessageDataSet` | Manage email messages using [the Python standard library](https://docs.python.org/3/library/email.html) | `kedro.extras.datasets.email` |

* Added support for `pyproject.toml` to configure Kedro. `pyproject.toml` is used if `.kedro.yml` doesn't exist (Kedro configuration should be under `[tool.kedro]` section).
* Projects created with this version will have no `pipeline.py`, having been replaced by `hooks.py`.
* Added a set of registration hooks, as the new way of registering library components with a Kedro project:
* `register_pipelines()`, to replace `_get_pipelines()`
* `register_config_loader()`, to replace `_create_config_loader()`
* `register_catalog()`, to replace `_create_catalog()`
These can be defined in `src/<python_package>/hooks.py` and added to `.kedro.yml` (or `pyproject.toml`). The order of execution is: plugin hooks, `.kedro.yml` hooks, hooks in `ProjectContext.hooks`.
* Added ability to disable auto-registered Hooks using `.kedro.yml` (or `pyproject.toml`) configuration file.

Bug fixes and other changes
* Added option to run asynchronously via the Kedro CLI.
* Absorbed `.isort.cfg` settings into `setup.cfg`.
* Packaging a modular pipeline raises an error if the pipeline directory is empty or non-existent.

Breaking changes to the API
* `project_name`, `project_version` and `package_name` now have to be defined in `.kedro.yml` for projects using Kedro 0.16.5+.

Migration Guide
This release has accidentally broken the usage of `kedro lint` and `kedro jupyter notebook convert` on a project template generated with previous versions of Kedro (<=0.16.4). To amend this, please either upgrade to `kedro==0.16.6` or update `.kedro.yml` within your project root directory to include the following keys:

yaml
project_name: "<your_project_name>"
project_version: "<kedro_version_of_the_project>"
package_name: "<your_package_name>"

Thanks for supporting contributions
[Deepyaman Datta](https://github.com/deepyaman), [Bas Nijholt](https://github.com/basnijholt), [Sebastian Bertoli](https://github.com/sebastianbertoli)

0.16.4

Not secure

Major features and improvements
* Fixed a bug for using `ParallelRunner` on Windows.
* Enabled auto-discovery of hooks implementations coming from installed plugins.

Bug fixes and other changes
* Fixed a bug for using `ParallelRunner` on Windows.
* Modified `GBQTableDataSet` to load customized results using customized queries from Google Big Query tables.
* Documentation improvements.

Breaking changes to the API

Thanks for supporting contributions
[Ajay Bisht](https://github.com/ajb7), [Vijay Sajjanar](https://github.com/vjkr), [Deepyaman Datta](https://github.com/deepyaman), [Sebastian Bertoli](https://github.com/sebastianbertoli), [Shahil Mawjee](https://github.com/s-mawjee), [Louis Guitton](https://github.com/louisguitton), [Emanuel Ferm](https://github.com/eferm)

0.16.3

Not secure

Major features and improvements
* Added the `kedro pipeline pull` CLI command to extract a packaged modular pipeline, and place the contents in a Kedro project.
* Added the `--version` option to `kedro pipeline package` to allow specifying alternative versions to package under.
* Added the `--starter` option to `kedro new` to create a new project from a local, remote or aliased starter template.
* Added the `kedro starter list` CLI command to list all starter templates that can be used to bootstrap a new Kedro project.
* Added the following new datasets.

| Type | Description | Location |
| ------------------ | ----------------------------------------------------------------------------------------------------- | ---------------------------- |
| `json.JSONDataSet` | Work with JSON files using [the Python standard library](https://docs.python.org/3/library/json.html) | `kedro.extras.datasets.json` |

Bug fixes and other changes
* Removed `/src/nodes` directory from the project template and made `kedro jupyter convert` create it on the fly if necessary.
* Fixed a bug in `MatplotlibWriter` which prevented saving lists and dictionaries of plots locally on Windows.
* Closed all pyplot windows after saving in `MatplotlibWriter`.
* Documentation improvements:
- Added [kedro-wings](https://github.com/tamsanh/kedro-wings) and [kedro-great](https://github.com/tamsanh/kedro-great) to the list of community plugins.
* Fixed broken versioning for Windows paths.
* Fixed `DataSet` string representation for falsy values.
* Improved the error message when duplicate nodes are passed to the `Pipeline` initializer.
* Fixed a bug where `kedro docs` would fail because the built docs were located in a different directory.
* Fixed a bug where `ParallelRunner` would fail on Windows machines whose reported CPU count exceeded 61.
* Fixed an issue with saving TensorFlow model to `h5` file on Windows.
* Added a `json` parameter to `APIDataSet` for the convenience of generating requests with JSON bodies.
* Fixed dependencies for `SparkDataSet` to include spark.

Breaking changes to the API

Thanks for supporting contributions
[Deepyaman Datta](https://github.com/deepyaman), [Tam-Sanh Nguyen](https://github.com/tamsanh), [DataEngineerOne](http://youtube.com/DataEngineerOne)

0.16.2

Not secure

Guide to apply the fix for `%run_viz` line magic in existing project

Even though this release ships a fix for project generated with `kedro==0.16.2`, after upgrading, you will still need to make a change in your existing project if it was generated with `kedro>=0.16.0,<=0.16.1` for the fix to take effect. Specifically, please change the content of your project's IPython init script located at `.ipython/profile_default/startup/00-kedro-init.py` with the content of [this file](https://github.com/kedro-org/kedro/blob/0.16.2/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/.ipython/profile_default/startup/00-kedro-init.py). You will also need `kedro-viz>=3.3.1`.

Thanks for supporting contributions
[Miguel Rodriguez Gutierrez](https://github.com/MigQ2), [Joel Schwarzmann](https://github.com/datajoely), [w0rdsm1th](https://github.com/w0rdsm1th), [Deepyaman Datta](https://github.com/deepyaman), [Tam-Sanh Nguyen](https://github.com/tamsanh), [Marcus Gawronsky](https://github.com/marcusinthesky)

0.16.1

Not secure

Major features and improvements

Bug fixes and other changes
* Fixed deprecation warnings from `kedro.cli` and `kedro.context` when running `kedro jupyter notebook`.
* Fixed a bug where `catalog` and `context` were not available in Jupyter Lab and Notebook.
* Fixed a bug where `kedro build-reqs` would fail if you didn't have your project dependencies installed.

Breaking changes to the API

Thanks for supporting contributions

0.16.0

Not secure

Major features and improvements
CLI
* Added new CLI commands (only available for the projects created using Kedro 0.16.0 or later):
- `kedro catalog list` to list datasets in your catalog
- `kedro pipeline list` to list pipelines
- `kedro pipeline describe` to describe a specific pipeline
- `kedro pipeline create` to create a modular pipeline
* Improved the CLI speed by up to 50%.
* Improved error handling when making a typo on the CLI. We now suggest some of the possible commands you meant to type, in `git`-style.

Framework
* All modules in `kedro.cli` and `kedro.context` have been moved into `kedro.framework.cli` and `kedro.framework.context` respectively. `kedro.cli` and `kedro.context` will be removed in future releases.
* Added `Hooks`, which is a new mechanism for extending Kedro.
* Fixed `load_context` changing user's current working directory.
* Allowed the source directory to be configurable in `.kedro.yml`.
* Added the ability to specify nested parameter values inside your node inputs, e.g. `node(func, "params:a.b", None)`
DataSets
* Added the following new datasets.

| Type | Description | Location |
| -------------------------- | ------------------------------------------- | --------------------------------- |
| `pillow.ImageDataSet` | Work with image files using `Pillow` | `kedro.extras.datasets.pillow` |
| `geopandas.GeoJSONDataSet` | Work with geospatial data using `GeoPandas` | `kedro.extras.datasets.geopandas` |
| `api.APIDataSet` | Work with data from HTTP(S) API requests | `kedro.extras.datasets.api` |

* Added `joblib` backend support to `pickle.PickleDataSet`.
* Added versioning support to `MatplotlibWriter` dataset.
* Added the ability to install dependencies for a given dataset with more granularity, e.g. `pip install "kedro[pandas.ParquetDataSet]"`.
* Added the ability to specify extra arguments, e.g. `encoding` or `compression`, for `fsspec.spec.AbstractFileSystem.open()` calls when loading/saving a dataset. See Example 3 under [docs](https://docs.kedro.org/en/0.16.0/04_user_guide/04_data_catalog.html#use-the-data-catalog-with-the-yaml-api).

Other
* Added `namespace` property on ``Node``, related to the modular pipeline where the node belongs.
* Added an option to enable asynchronous loading inputs and saving outputs in both `SequentialRunner(is_async=True)` and `ParallelRunner(is_async=True)` class.
* Added `MemoryProfiler` transformer.
* Removed the requirement to have all dependencies for a dataset module to use only a subset of the datasets within.
* Added support for `pandas>=1.0`.
* Enabled Python 3.8 compatibility. _Please note that a Spark workflow may be unreliable for this Python version as `pyspark` is not fully-compatible with 3.8 yet._
* Renamed "features" layer to "feature" layer to be consistent with (most) other layers and the [relevant FAQ](https://docs.kedro.org/en/0.16.0/06_resources/01_faq.html#what-is-data-engineering-convention).

Bug fixes and other changes
* Fixed a bug where a new version created mid-run by an external system caused inconsistencies in the load versions used in the current run.
* Documentation improvements
* Added instruction in the documentation on how to create a custom runner).
* Updated contribution process in `CONTRIBUTING.md` - added Developer Workflow.
* Documented installation of development version of Kedro in the [FAQ section](https://docs.kedro.org/en/0.16.0/06_resources/01_faq.html#how-can-i-use-development-version-of-kedro).
* Added missing `_exists` method to `MyOwnDataSet` example in 04_user_guide/08_advanced_io.
* Fixed a bug where `PartitionedDataSet` and `IncrementalDataSet` were not working with `s3a` or `s3n` protocol.
* Added ability to read partitioned parquet file from a directory in `pandas.ParquetDataSet`.
* Replaced `functools.lru_cache` with `cachetools.cachedmethod` in `PartitionedDataSet` and `IncrementalDataSet` for per-instance cache invalidation.
* Implemented custom glob function for `SparkDataSet` when running on Databricks.
* Fixed a bug in `SparkDataSet` not allowing for loading data from DBFS in a Windows machine using Databricks-connect.
* Improved the error message for `DataSetNotFoundError` to suggest possible dataset names user meant to type.
* Added the option for contributors to run Kedro tests locally without Spark installation with `make test-no-spark`.
* Added option to lint the project without applying the formatting changes (`kedro lint --check-only`).

Breaking changes to the API
Datasets
* Deleted obsolete datasets from `kedro.io`.
* Deleted `kedro.contrib` and `extras` folders.
* Deleted obsolete `CSVBlobDataSet` and `JSONBlobDataSet` dataset types.
* Made `invalidate_cache` method on datasets private.
* `get_last_load_version` and `get_last_save_version` methods are no longer available on `AbstractDataSet`.
* `get_last_load_version` and `get_last_save_version` have been renamed to `resolve_load_version` and `resolve_save_version` on ``AbstractVersionedDataSet``, the results of which are cached.
* The `release()` method on datasets extending ``AbstractVersionedDataSet`` clears the cached load and save version. All custom datasets must call `super()._release()` inside `_release()`.
* ``TextDataSet`` no longer has `load_args` and `save_args`. These can instead be specified under `open_args_load` or `open_args_save` in `fs_args`.
* `PartitionedDataSet` and `IncrementalDataSet` method `invalidate_cache` was made private: `_invalidate_caches`.

Other
* Removed `KEDRO_ENV_VAR` from `kedro.context` to speed up the CLI run time.
* `Pipeline.name` has been removed in favour of `Pipeline.tag()`.
* Dropped `Pipeline.transform()` in favour of `kedro.pipeline.modular_pipeline.pipeline()` helper function.
* Made constant `PARAMETER_KEYWORDS` private, and moved it from `kedro.pipeline.pipeline` to `kedro.pipeline.modular_pipeline`.
* Layers are no longer part of the dataset object, as they've moved to the `DataCatalog`.
* Python 3.5 is no longer supported by the current and all future versions of Kedro.

Migration guide from Kedro 0.15.* to 0.16.*

General Migration

**reminder** [How do I upgrade Kedro](https://docs.kedro.org/en/0.16.0/06_resources/01_faq.html#how-do-i-upgrade-kedro) covers a few key things to remember when updating any kedro version.

Migration for datasets

Since all the datasets (from `kedro.io` and `kedro.contrib.io`) were moved to `kedro/extras/datasets` you must update the type of all datasets in `<project>/conf/base/catalog.yml` file.
Here how it should be changed: `type: <SomeDataSet>` -> `type: <subfolder of kedro/extras/datasets>.<SomeDataSet>` (e.g. `type: CSVDataSet` -> `type: pandas.CSVDataSet`).

In addition, all the specific datasets like `CSVLocalDataSet`, `CSVS3DataSet` etc. were deprecated. Instead, you must use generalized datasets like `CSVDataSet`.
E.g. `type: CSVS3DataSet` -> `type: pandas.CSVDataSet`.

> Note: No changes required if you are using your custom dataset.

Migration for Pipeline.transform()
`Pipeline.transform()` has been dropped in favour of the `pipeline()` constructor. The following changes apply:
- Remember to import `from kedro.pipeline import pipeline`
- The `prefix` argument has been renamed to `namespace`
- And `datasets` has been broken down into more granular arguments:
- `inputs`: Independent inputs to the pipeline
- `outputs`: Any output created in the pipeline, whether an intermediary dataset or a leaf output
- `parameters`: `params:...` or `parameters`

As an example, code that used to look like this with the `Pipeline.transform()` constructor:
python
result = my_pipeline.transform(
datasets={"input": "new_input", "output": "new_output", "params:x": "params:y"},
prefix="pre",
)

When used with the new `pipeline()` constructor, becomes:
python
from kedro.pipeline import pipeline

result = pipeline(
my_pipeline,
inputs={"input": "new_input"},
outputs={"output": "new_output"},
parameters={"params:x": "params:y"},
namespace="pre",
)

Migration for decorators, color logger, transformers etc.
Since some modules were moved to other locations you need to update import paths appropriately.
You can find the list of moved files in the [`0.15.6` release notes](https://github.com/kedro-org/kedro/releases/tag/0.15.6) under the section titled `Files with a new location`.

Migration for CLI and KEDRO_ENV environment variable
> Note: If you haven't made significant changes to your `kedro_cli.py`, it may be easier to simply copy the updated `kedro_cli.py` `.ipython/profile_default/startup/00-kedro-init.py` and from GitHub or a newly generated project into your old project.

* We've removed `KEDRO_ENV_VAR` from `kedro.context`. To get your existing project template working, you'll need to remove all instances of `KEDRO_ENV_VAR` from your project template:
- From the imports in `kedro_cli.py` and `.ipython/profile_default/startup/00-kedro-init.py`: `from kedro.context import KEDRO_ENV_VAR, load_context` -> `from kedro.framework.context import load_context`
- Remove the `envvar=KEDRO_ENV_VAR` line from the click options in `run`, `jupyter_notebook` and `jupyter_lab` in `kedro_cli.py`
- Replace `KEDRO_ENV_VAR` with `"KEDRO_ENV"` in `_build_jupyter_env`
- Replace `context = load_context(path, env=os.getenv(KEDRO_ENV_VAR))` with `context = load_context(path)` in `.ipython/profile_default/startup/00-kedro-init.py`

Migration for `kedro build-reqs`

We have upgraded `pip-tools` which is used by `kedro build-reqs` to 5.x. This `pip-tools` version requires `pip>=20.0`. To upgrade `pip`, please refer to [their documentation](https://pip.pypa.io/en/stable/installing/#upgrading-pip).

Thanks for supporting contributions
[foolsgold](https://github.com/foolsgold), [Mani Sarkar](https://github.com/neomatrix369), [Priyanka Shanbhag](https://github.com/priyanka1414), [Luis Blanche](https://github.com/LuisBlanche), [Deepyaman Datta](https://github.com/deepyaman), [Antony Milne](https://github.com/AntonyMilneQB), [Panos Psimatikas](https://github.com/ppsimatikas), [Tam-Sanh Nguyen](https://github.com/tamsanh), [Tomasz Kaczmarczyk](https://github.com/TomaszKaczmarczyk), [Kody Fischer](https://github.com/Klio-Foxtrot187), [Waylon Walker](https://github.com/waylonwalker)

Page 20 of 23

Releases

Has known vulnerabilities

Previous Next

Kedro

Page 20 of 23

0.16.5

0.16.4

0.16.3

0.16.2

0.16.1

0.16.0

Page 20 of 23

Links

Releases