Major Features and Improvements
* TFX version 0.21.0 will be the last version of TFX supporting Python 2.
* Added experimental cli option `template`, which can be used to scaffold a
new pipeline from TFX templates. Currently the `taxi` template is provided
and more templates would be added in future versions.
* Added support for `RuntimeParameter`s to allow users can specify templated
values at runtime. This is currently only supported in Kubeflow Pipelines.
Currently, only attributes in `ComponentSpec.PARAMETERS` and the URI of
external artifacts can be parameterized (component inputs / outputs can
not yet be parameterized). See
`tfx/examples/chicago_taxi_pipeline/taxi_pipeline_runtime_parameter.py`
for example usage.
* Users can access the parameterized pipeline root when defining the
pipeline by using the `pipeline.ROOT_PARAMETER` placeholder in
KubeflowDagRunner.
* Users can pass appropriately encoded Python `dict` objects to specify
protobuf parameters in `ComponentSpec.PARAMETERS`; these will be decoded
into the proper protobuf type. Users can avoid manually constructing complex
nested protobuf messages in the component interface.
* Added support in Trainer for using other model artifacts. This enables
scenarios such as warm-starting.
* Updated trainer executor to pass through custom config to the user module.
* Artifact type-specific properties can be defined through overriding the
`PROPERTIES` dictionary of a `types.artifact.Artifact` subclass.
* Added new example of chicago_taxi_pipeline on Google Cloud Bigquery ML.
* Added support for multi-core processing in the Flink and Spark Chicago Taxi
PortableRunner example.
* Added a metadata adapter in Kubeflow to support logging the Argo pod ID as
an execution property.
* Added a prototype Tuner component and an end-to-end iris example.
* Created new generic trainer executor for non estimator based model, e.g.,
native Keras.
* Updated to support passing `tfma.EvalConfig` in evaluator when calling TFMA.
* Added an iris example with native Keras.
* Added an MNIST example with native Keras.
Bug fixes and other changes
* Switched the default behavior of KubeflowDagRunner to not mounting GCP
secret.
* Fixed "invalid spec: spec.arguments.parameters[6].name 'pipeline-root' is
not unique" error when the user include `pipeline.ROOT_PARAMETER` and run
pipeline on KFP.
* Added support for an hparams artifact as an input to Trainer in
preparation for tuner support.
* Refactored common dependencies in the TFX dockerfile to a base image to
improve the reliability of image building process.
* Fixes missing Tensorboard link in KubeflowDagRunner.
* Depends on `apache-beam[gcp]>=2.17,<2.18`
* Depends on `ml-metadata>=0.21,<0.22`.
* Depends on `tensorflow-data-validation>=0.21,<0.22`.
* Depends on `tensorflow-model-analysis>=0.21,<0.22`.
* Depends on `tensorflow-transform>=0.21,<0.22`.
* Depends on `tfx-bsl>=0.21,<0.22`.
* Depends on `pyarrow>=0.14,<0.15`.
* Removed `tf.compat.v1` usage for iris and cifar10 examples.
* CSVExampleGen: started using the CSV decoding utilities in `tfx-bsl`
(`tfx-bsl>=0.15.2`)
* Fixed problems with Airflow tutorial notebooks.
* Added performance improvements for the Transform Component (for statistics
generation).
* Raised exceptions when container building fails.
* Enhanced custom slack component by adding a kubeflow example.
* Allowed windows style paths in Transform component cache.
* Fixed bug in CLI (--engine=kubeflow) which uses hard coded obsolete image
(TFX 0.14.0) as the base image.
* Fixed bug in CLI (--engine=kubeflow) which could not handle skaffold
response when an already built image is reused.
* Allowed users to specify the region to use when serving with AI Platform.
* Allowed users to give deterministic job id to AI Platform Training job.
* System-managed artifact properties ("name", "state", "pipeline_name" and
"producer_component") are now stored as ML Metadata artifact custom
properties.
* Fixed loading trainer and transformation functions from python module files
without the .py extension.
* Fixed some ill-formed visualization when running on KFP.
* Removed system info from artifact properties and use channels to hold info
for generating MLMD queries.
* Rely on MLMD context for inter-component artifact resolution and execution
publishing.
* Added pipeline level context and component run level context.
* Included test data for examples/chicago_taxi_pipeline in package.
* Changed `BaseComponentLauncher` to require the user to pass in an ML
Metadata connection object instead of a ML Metadata connection config.
* Capped version of Tensorflow runtime used in Google Cloud integration to
1.15.
* Updated Chicago Taxi example dependencies to Beam 2.17.0, Flink 1.9.1, Spark
2.4.4.
* Fixed an issue where `build_ephemeral_package()` used an incorrect path to
locate the `tfx` directory.
* The ImporterNode now allows specification of general artifact properties.
* Added 'tfx_executor', 'tfx_version' and 'tfx_py_version' labels for CAIP,
BQML and Dataflow jobs submitted from TFX components.
* Use '_' instead of '/' in feature names of several examples to avoid
potential clash with namescope separator.
Deprecations
* N/A
Breaking changes
For pipeline authors
* Standard artifact TYPE_NAME strings were reconciled to match their class
names in `types.standard_artifacts`.
* The "split" property on multiple artifacts has been replaced with the
JSON-encoded "split_names" property on a single grouped artifact.
* The execution caching mechanism was changed to rely on ML Metadata
pipeline context. Existing cached executions will not be reused when running
on this version of TFX for the first time.
* The "split" property on multiple artifacts has been replaced with the
JSON-encoded "split_names" property on a single grouped artifact.
For component authors
* Artifact type name strings to the `types.artifact.Artifact` and
`types.channel.Channel` classes are no longer supported; usage here should
be replaced with references to the artifact subclasses defined in
`types.standard_artfacts.*` or to custom subclasses of
`types.artifact.Artifact`.
Documentation updates
* N/A