Tensorflow-transform

Latest version: v1.16.0

Safety actively analyzes 688313 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.21.2

Major Features and Improvements
* Expanded capability for per-key analyzers to analyze larger sets of keys that
would not fit in memory, by storing the key-value pairs in vocabulary files.
This is enabled by passing a `per_key_filename` to `tft.count_per_key` and
`tft.scale_to_z_score_per_key`.
* Added `tft.TransformFeaturesLayer` and
`tft.TFTransformOutput.transform_features_layers` to allow transforming
features for a TensorFlow Keras model.

Bug Fixes and Other Changes

* `tft.apply_buckets_with_interpolation` now handles NaN values by imputing with
the middle of the normalized range.
* Depends on `tfx-bsl>=0.21.3,<0.22`.

Breaking changes

Deprecations

0.21.0

Major Features and Improvements
* Added a new version of the census example to demonstrate usage in TF 2.0.
* New mapper `estimated_probability_density` to compute either exact
probabilities (for discrete categorical variable) or approximate density over
fixed intervals (continuous variables).
* New analyzers `count_per_key` and `histogram` to return counts of unique
elements or values within predefined ranges. Calling `tft.histogram` on
non-categorical value will assign each data point to the appropriate fixed
bucket and then count for each bucket.
* Provided capability for per-key analyzers to analyze larger sets of keys that
would not fit in memory, by storing the key-value pairs in vocabulary files.
This is enabled by passing a `per_key_filename` to
`tft.scale_by_min_max_per_key` and `tft.scale_to_0_1_per_key`.

Bug Fixes and Other Changes
* Added beam counters to log analyzer and mapper usage.
* Cleanup deprecated APIs used in census and sentiment examples.
* Support windows style paths in `analyzer_cache`.
* `tft_beam.WriteTransformFn` and `tft_beam.WriteMetadata` have been made
idempotent to allow retrying them in case of a failure.
* `tft_beam.WriteMetadata` takes an optional argument `write_to_unique_subdir`
and returns the path to which metadata was written. If
`write_to_unique_subdir` is True, metadata is written to a unique subdirectory
under `path`, otherwise it is written to `path`.
* Support non utf-8 characters when reading vocabularies in
`tft.TFTransformOutput`
* `tft.TFTransformOutput.vocabulary_by_name` now returns bytes instead of str
with python 3.

Breaking changes

Deprecations

0.15.0

Major Features and Improvements
* This release introduces initial beta support for TF 2.0. TF 2.0 programs
running in "safety" mode (i.e. using TF 1.X APIs through the
`tensorflow.compat.v1` compatibility module are expected to work. Newly
written TF 2.0 programs may not work if they exercise functionality that is
not yet supported. If you do encounter an issue when using
`tensorflow-transform` with TF 2.0, please create an issue
https://github.com/tensorflow/transform/issues with instructions on how to
reproduce it.
* Performance improvements for `preprocessing_fns` with many Quantiles
analyzers.
* `tft.quantiles` and `tft.bucketize` are now using new TF core quantiles ops
instead of contrib ops.
* Performance improvements due to packing multiple combine analyzers into a
single Beam Combiner.

Bug Fixes and Other Changes
* Existing analyzer cache is invalidated.
* Saved transforms now support composite tensors (such as `tf.RaggedTensor`).
* Vocabulary's cache coder now supports non utf-8 encodable tokens.
* Fixes encoding of the `tft.covariance` accumulator cache.
* Fixes encoding per-key analyzers accumulator cache.
* Make various utility methods in `tft.inspect_preprocessing_fn` support
`tf.RaggedTensor`.
* Moved beam/shared lib to `tfx-bsl`. If running with latest master, `tfx-bsl`
must also be latest master.
* `preprocessing_fn`s now have beta support of calls to `tf.function`s, as long
as they don't contain calls to `tf.Transform` analyzers/mappers or table
initializers.
* `tft.quantiles` and `tft.bucketize` are now using core TF ops.
* Depends on `tfx-bsl>=0.15,<0.16`.
* Depends on `tensorflow-metadata>=0.15,<0.16`.
* Depends on `apache-beam[gcp]>=2.16,<3`.
* Depends on `tensorflow>=0.15,<2.2`.
* Starting from 1.15, package
`tensorflow` comes with GPU support. Users won't need to choose between
`tensorflow` and `tensorflow-gpu`.
* Caveat: `tensorflow` 2.0.0 is an exception and does not have GPU
support. If `tensorflow-gpu` 2.0.0 is installed before installing
`tensorflow-transform`, it will be replaced with `tensorflow` 2.0.0.
Re-install `tensorflow-gpu` 2.0.0 if needed.

Breaking changes
* `always_return_num_quantiles` changed to default to True in `tft.quantiles`
and `tft.bucketize`, resulting in exact bucket count returned.
* Removes the `input_fn_maker` module which has been deprecated since TFT 0.11.
For idiomatic construction of `input_fn`, see `tensorflow_transform` examples.

Deprecations

0.14.0

Major Features and Improvements
* New `tft.word_count` mapper to identify the number of tokens for each row
(for pre-tokenized strings).
* All `tft.scale_to_*` mappers now have per-key variants, along with analyzers
for `mean_and_var_per_key` and `min_and_max_per_key`.
* New `tft_beam.AnalyzeDatasetWithCache` allows analyzing ranges of data while
producing and utilizing cache. `tft.analyzer_cache` can help read and write
such cache to a filesystem between runs. This caching feature is worth using
when analyzing a rolling range in a continuous pipeline manner. This is an
experimental feature.
* Added `reduce_instance_dims` support to `tft.quantiles` and `elementwise` to
`tft.bucketize`, while avoiding separate beam calls for each feature.

Bug Fixes and Other Changes
* `sparse_tensor_to_dense_with_shape` now accepts an optional `default_value`
parameter.
* `tft.vocabulary` and `tft.compute_and_apply_vocabulary` now support
`fingerprint_shuffle` to sort the vocabularies by fingerprint instead of
counts. This is useful for load balancing the training parameter servers.
This is an experimental feature.
* Fix numerical instability in `tft.vocabulary` mutual information calculations.
* `tft.vocabulary` and `tft.compute_and_apply_vocabulary` now support computing
vocabularies over integer categoricals and multivalent input features, and
computing mutual information for non-binary labels.
* New numeric normalization method available:
`tft.apply_buckets_with_interpolation`.
* Changes to make this library more compatible with TensorFlow 2.0.
* Fix sanitizing of vocabulary filenames.
* Emit a friendly error message when context isn't set.
* Analyzer output dtypes are enforced to be TensorFlow dtypes, and by extension
`ptransform_analyzer`'s `output_dtypes` is enforced to be a list of TensorFlow
dtypes.
* Make `tft.apply_buckets_with_interpolation` support SparseTensors.
* Adds an experimental api for analyzers to annotate the post-transform schema.
* `TFTransformOutput.transform_raw_features` now accepts an optional
`drop_unused_features` parameter to exclude unused features in output.
* If not specified, the min_diff_from_avg parameter of `tft.vocabulary` now
defaults to a reasonable value based on the size of the dataset (relevant
only if computing vocabularies using mutual information).
* Convert some `tf.contrib` functions to be compatible with TF2.0.
* New `tft.bag_of_words` mapper to compute the unique set of ngrams for each row
(for pre-tokenized strings).
* Fixed a bug in `tf_utils.reduce_batch_count_mean_and_var`, and as a result
`mean_and_var` analyzer, was miscalculating variance for the sparse
elementwise=True case.
* At test utility `tft_unit.cross_named_parameters` for creating parameterized
tests that involve the cartesian product of various parameters.
* Depends on `tensorflow-metadata>=0.14,<0.15`.
* Depends on `apache-beam[gcp]>=2.14,<3`.
* Depends on `numpy>=1.16,<2`.
* Depends on `absl-py>=0.7,<2`.
* Allow `preprocessing_fn` to emit a `tf.RaggedTensor`. In this case, the
output `Schema` proto will not be able to be converted to a feature spec,
and so the output data will not be able to be materialized with `tft.coders`.
* Ability to directly set exact `num_buckets` with new parameter
`always_return_num_quantiles` for `analyzers.quantiles` and
`mappers.bucketize`, defaulting to False in general but True when
`reduce_instance_dims` is False.

Breaking changes
* `tf_utils.reduce_batch_count_mean_and_var`, which feeds into
`tft.mean_and_var`, now returns 0 instead of inf for empty columns of a
sparse tensor.
* `tensorflow_transform.tf_metadata.dataset_schema.Schema` class is removed.
Wherever a `dataset_schema.Schema` was used, users should now provide a
`tensorflow_metadata.proto.v0.schema_pb2.Schema` proto. For backwards
compatibility, `dataset_schema.Schema` is now a factory method that produces
a `Schema` proto. Updating code should be straightforward because the
`dataset_schema.Schema` class was already a wrapper around the `Schema` proto.
* Only explicitly public analyzers are exported to the `tft` module, e.g.
combiners are no longer exported and have to be accessed directly through
`tft.analyzers`.
* Requires pre-installed TensorFlow >=1.14,<2.

Deprecations
* `DatasetSchema` is now a deprecated factory method (see above).
* `tft.tf_metadata.dataset_schema.from_feature_spec` is now deprecated.
Equivalent functionality is provided by
`tft.tf_metadata.schema_utils.schema_from_feature_spec`.

0.13.0

Major Features and Improvements
* Now `AnalyzeDataset`, `TransformDataset` and `AnalyzeAndTransformDataset` can
accept input data that only contains columns needed for that operation as
opposed to all columns defined in schema. Utility methods to infer the list of
needed columns are added to `tft.inspect_preprocessing_fn`. This makes it
easier to take advantage of columnar projection when data is stored in
columnar storage formats.
* Python 3.5 is supported.

Bug Fixes and Other Changes
* Version is now accessible as `tensorflow_transform.__version__`.
* Depends on `apache-beam[gcp]>=2.11,<3`.
* Depends on `protobuf>=3.7,<4`.

Breaking changes
* Coders now return index and value features rather than a combined feature for
`SparseFeature`.
* Requires pre-installed TensorFlow >=1.13,<2.

Deprecations

0.12.0

Major Features and Improvements
* Python 3.5 readiness complete (all tests pass). Full Python 3.5 compatibility
is expected to be available with the next version of Transform (after
Apache Beam 2.11 is released).
* Performance improvements for vocabulary generation when using top_k.
* New optimized highly experimental API for analyzing a dataset was added,
`AnalyzeDatasetWithCache`, which allows reading and writing analyzer cache.
* Update `DatasetMetadata` to be a wrapper around the
`tensorflow_metadata.proto.v0.schema_pb2.Schema` proto. TensorFlow Metadata
will be the schema used to define data parsing across TFX. The serialized
`DatasetMetadata` is now the `Schema` proto in ascii format, but the previous
format can still be read.
* Change `ApplySavedModel` implementation to use `tf.Session.make_callable`
instead of `tf.Session.run` for improved performance.

Bug Fixes and Other Changes

* `tft.vocabulary` and `tft.compute_and_apply_vocabulary` now support
filtering based on adjusted mutual information when
`use_adjusetd_mutual_info` is set to True.
* `tft.vocabulary` and `tft.compute_and_apply_vocabulary` now takes
regularization term 'min_diff_from_avg' that adjusts mutual information to
zero whenever the difference between count of the feature with any label and
its expected count is lower than the threshold.
* Added an option to `tft.vocabulary` and `tft.compute_and_apply_vocabulary`
to compute a coverage vocabulary, using the new `coverage_top_k`,
`coverage_frequency_threshold` and `key_fn` parameters.
* Added `tft.ptransform_analyzer` for advanced use cases.
* Modified `QuantilesCombiner` to use `tf.Session.make_callable` instead of
`tf.Session.run` for improved performance.
* ExampleProtoCoder now also supports non-serialized Example representations.
* `tft.tfidf` now accepts a scalar Tensor as `vocab_size`.
* `assertItemsEqual` in unit tests are replaced by `assertCountEqual`.
* `NumPyCombiner` now outputs TF dtypes in output_tensor_infos instead of
numpy dtypes.
* Adds function `tft.apply_pyfunc` that provides limited support for
`tf.pyfunc`. Note that this is incompatible with serving. See documentation
for more details.
* `CombinePerKey` now adds a dimension for the key.
* Depends on `numpy>=1.14.5,<2`.
* Depends on `apache-beam[gcp]>=2.10,<3`.
* Depends on `protobuf==3.7.0rc2`.
* `ExampleProtoCoder.encode` now converts a feature whose value is `None` to an
empty value, where before it did not accept `None` as a valid value.
* `AnalyzeDataset`, `AnalyzeAndTransformDataset` and `TransformDataset` can now
accept dictionaries which contain `None`, and which will be interpreted the
same as an empty list. They will never produce an output containing `None`.

Breaking changes
* `ColumnSchema` and related classes (`Domain`, `Axis` and
`ColumnRepresentation` and their subclasses) have been removed. In order to
create a schema, use `from_feature_spec`. In order to inspect a schema
use the `as_feature_spec` and `domains` methods of `Schema`. The
constructors of these classes are replaced by functions that still work when
creating a `Schema` but this usage is deprecated.
* Requires pre-installed TensorFlow >=1.12,<2.
* `ExampleProtoCoder.decode` now converts a feature with empty value (e.g.
`features { feature { key: "varlen" value { } } }`) or missing key for a
feature (e.g. `features { }`) to a `None` in the output dictionary. Before
it would represent these with an empty list. This better reflects the
original example proto and is consistent with TensorFlow Data Validation.
* Coders now returns a `list` instead of an `ndarray` for a `VarLenFeature`.

Deprecations

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.