Tensorflow-data-validation

Latest version: v1.16.1

Safety actively analyzes 688007 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.23.0

Major Features and Improvements

* Data validation is now able to handle arbitrarily nested arrow
List/LargeList types. Schema entries for features with multiple nest levels
describe the value count at each level in the value_counts field.
* Add combiner stats generator to estimate top-K and uniques using Misra-Gries
and K-Minimum Values sketches.

Bug Fixes and Other Changes

* Validate that enough supported images are present (if
image_domain.minimum_supported_image_fraction is provided).
* Stopped requiring avro-python3.
* Depends on `apache-beam[gcp]>=2.23,<3`.
* Depends on `pyarrow>=0.17,<0.18`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3`.
* Depends on `tensorflow-metadata>=0.23,<0.24`.
* Depends on `tensorflow-transform>=0.23,<0.24`.
* Depends on `tfx-bsl>=0.23,<0.24`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* Note: We plan to remove Python 3.5 support after this release.

0.22.2

Major Features and Improvements

Bug Fixes and Other Changes
* Fixed a bug that affected tfx 0.22.0 to work with TFDV 0.22.1.
* Depends on 'avro-python3>=1.8.1,<1.9.2' on Python 3.5 + MacOS

Known Issues

Breaking Changes

Deprecations

0.22.1

Major Features and Improvements

* Statistics generation is now able to handle arbitrarily nested arrow
List/LargeList types. Stats about the list elements' presence and valency
are computed at each nest level, and stored in a newly added field,
`valency_and_presence_stats` in `CommonStatistics`.

Bug Fixes and Other Changes

* Trigger DATASET_HIGH_NUM_EXAMPLES when a dataset has more than the specified
limit on number of examples.
* Fix bug in display_anomalies that prevented dataset-level anomalies from
being displayed.
* Trigger anomalies when a feature has a number of unique values that does not
conform to the specified minimum/maximum.
* Trigger anomalies when a float feature has unexpected Inf / -Inf values.
* Depends on `apache-beam[gcp]>=2.22,<3`.
* Depends on `pandas>=0.24,<2`.
* Depends on `tensorflow-metadata>=0.22.2,<0.23.0`.
* Depends on `tfx-bsl>=0.22.1,<0.23.0`.

Known Issues

Breaking Changes

Deprecations

0.22.0

Major Features and Improvements

Bug Fixes and Other Changes

* Crop values in natural language stats generator.
* Switch to using PyBind11 instead of SWIG for wrapping C++ libraries.
* CSV decoder support for multivalent columns by using tfx_bsl's decoder.
* When inferring a schema entry for a feature, do not add a shape with dim = 0
when min_num_values = 0.
* Add utility methods `tfdv.get_slice_stats` to get statistics for a slice and
`tfdv.compare_slices` to compare statistics of two slices using Facets.
* Make `tfdv.load_stats_text` and `tfdv.write_stats_text` public.
* Add PTransforms `tfdv.WriteStatisticsToText` and
`tfdv.WriteStatisticsToTFRecord` to write statistics proto to text and
tfrecord files respectively.
* Modify `tfdv.load_statistics` to handle reading statistics from TFRecord and
text files.
* Added an extra requirement group `mutual-information`. As a result, barebone
TFDV does not require `scikit-learn` any more.
* Added an extra requirement group `visualization`. As a result, barebone TFDV
does not require `ipython` any more.
* Added an extra requirement group `all` that specifies all the extra
dependencies TFDV needs. Use `pip install tensorflow-data-validation[all]`
to pull in those dependencies.
* Depends on `pyarrow>=0.16,<0.17`.
* Depends on `apache-beam[gcp]>=2.20,<3`.
* Depends on `ipython>=7,<8;python_version>="3"'.
* Depends on `scikit-learn>=0.18,<0.24'.
* Depends on `tensorflow>=1.15,!=2.0.*,<3`.
* Depends on `tensorflow-metadata>=0.22.0,<0.23`.
* Depends on `tensorflow-transform>=0.22,<0.23`.
* Depends on `tfx-bsl>=0.22,<0.23`.

Known Issues

* (Known issue resolution) It is no longer necessary to use Apache Beam 2.17
when running TFDV on Windows. The current release of Apache Beam will work.

Breaking Changes

* `tfdv.GenerateStatistics` now accepts a PCollection of `pa.RecordBatch`
instead of `pa.Table`.
* All the TFDV coders now output a PCollection of `pa.RecordBatch` instead of
a PCollection of `pa.Table`.
* `tfdv.validate_instances` and
`tfdv.api.validation_api.IdentifyAnomalousExamples` now takes
`pa.RecordBatch` as input instead of `pa.Table`.
* The `StatsGenerator` interface (and all its sub-classes) now takes
`pa.RecordBatch` as the input data instead of `pa.Table`.
* Custom slicing functions now accepts a `pa.RecordBatch` instead of
`pa.Table` as input and should output a tuple `(slice_key, record_batch)`.

Deprecations

* Deprecating Py2 support.

0.21.5

Major Features and Improvements

* Add `label_feature` to `StatsOptions` and enable `LiftStatsGenerator` when
`label_feature` and `schema` are provided.
* Add JSON serialization support for StatsOptions.

Bug Fixes and Other Changes
* Only requires `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` on Python 3.5 + MacOS

Breaking Changes

Deprecations

0.21.4

Major Features and Improvements

* Support visualizing feature value lift in facets visualization.

Bug Fixes and Other Changes

* Fix issue writing out string feature values in LiftStatsGenerator.
* Requires 'apache-beam[gcp]>=2.17,<3'.
* Requires 'tensorflow-transform>=0.21.1,<0.22'.
* Requires 'tfx-bsl>=0.21.3,<0.22'.

Breaking Changes

Deprecations

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.