Tensorflow-datasets

Latest version: v4.9.7

Safety actively analyzes 687881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 6

4.7.0

Added

- [API] Added
[TfDataBuilder](https://www.tensorflow.org/datasets/format_specific_dataset_builders#datasets_based_on_tfdatadataset)
that is handy for storing experimental ad hoc TFDS datasets in notebook-like
environments such that they can be versioned, described, and easily shared
with teammates.
- [API] Added options to create format-specific dataset builders. The new API
now includes a number of NLP-specific builders, such as:
- [CoNNL](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conll)
- [CoNNL-U](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conllu)
- [API] Added `tfds.beam.inc_counter` to reduce `beam.metrics.Metrics.counter`
boilerplate
- [API] Added options to group together existing TFDS datasets into
[dataset collections](https://www.tensorflow.org/datasets/dataset_collections)
and to perform simple operations over them.
- [Documentation] update, specifically:
- [New guide](https://www.tensorflow.org/datasets/format_specific_dataset_builders)
on format-specific dataset builders;
- [New guide](https://www.tensorflow.org/datasets/add_dataset_collection)
on adding new dataset collections to TFDS;
- Updated [TFDS CLI](https://www.tensorflow.org/datasets/cli)
documentation.
- [TFDS CLI] Supports custom config through Json (e.g. `tfds build my_dataset
--config='{"name": "my_custom_config", "description": "Abc"}'`)
- New datasets:
- [conll2003](https://www.tensorflow.org/datasets/catalog/conll2003)
- [universal_dependency 2.10](https://www.tensorflow.org/datasets/catalog/universal_dependency)
- [bucc](https://www.tensorflow.org/datasets/catalog/bucc)
- [i_naturalist2021](https://www.tensorflow.org/datasets/catalog/i_naturalist2021)
- [mtnt](https://www.tensorflow.org/datasets/catalog/mtnt) Machine
Translation of Noisy Text.
- [placesfull](https://www.tensorflow.org/datasets/catalog/placesfull)
- [tatoeba](https://www.tensorflow.org/datasets/catalog/tatoeba)
- [user_libri_audio](https://www.tensorflow.org/datasets/catalog/user_libri_audio)
- [user_libri_text](https://www.tensorflow.org/datasets/catalog/user_libri_text)
- [xtreme_pos](https://www.tensorflow.org/datasets/catalog/xtreme_pos)
- [yahoo_ltrc](https://www.tensorflow.org/datasets/catalog/yahoo_ltrc)
- Updated datasets:
- [C4](https://www.tensorflow.org/datasets/catalog/c4) was updated to
version 3.1.
- [common_voice](https://www.tensorflow.org/datasets/catalog/common_voice)
was updated to a more recent snapshot.
- [wikipedia](https://www.tensorflow.org/datasets/catalog/wikipedia) was
updated with the `20220620` snapshot.
- New dataset collections, such as
[xtreme](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/xtreme/xtreme.py)
and
[LongT5](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/longt5/longt5.py)

Changed

- The base `Logger` class expects more information to be passed to the
`as_dataset` method. This should only be relevant to people who have
implemented and registered custom `Logger` class(es).
- You can set `DEFAULT_BUILDER_CONFIG_NAME` in a `DatasetBuilder` to change
the default config if it shouldn't be the first builder config defined in
`BUILDER_CONFIGS`.

Deprecated

Removed

Fixed

- Various datasets
- In Linux, when loading a dataset from a directory that is not your home
(`~`) directory, a new `~` directory is not created in the current directory
(fixes [4117](https://github.com/tensorflow/datasets/issues/4117)).

Security

4.6.0

Added

- Support for community datasets on GCS.
- [API] `tfds.builder_from_directory` and `tfds.builder_from_directories`, see
https://www.tensorflow.org/datasets/external_tfrecord#directly_from_folder.
- [API] Dash ("-") support in split names.
- [API] `file_format` argument to `download_and_prepare` method, allowing user
to specify an alternative file format to store prepared data (e.g.
"riegeli").
- [API] `file_format` to `DatasetInfo` string representation.
- [API] Expose the return value of Beam pipelines. This allows for users to
read the Beam metrics.
- [API] Expose Feature `tf_example_spec` to public.
- [API] `doc` kwarg on `Feature`s, to describe a feature.
- [Documentation] Features description is shown on
[TFDS Catalog](https://www.tensorflow.org/datasets/catalog/overview).
- [Documentation] More metadata about HuggingFace datasets in TFDS catalog.
- [Performance] Parallel load of metadata files.
- [Testing] TFDS tests are now run using GitHub actions - misc improvements
such as caching and sharding.
- [Testing] Improvements to MockFs.
- New datasets.

Changed

- [API] `num_shards` is now optional in the shard name.

Removed

- TFDS pathlib API, migrated to a self-contained `etils.epath` (see
https://github.com/google/etils).

Fixed

- Various datasets.
- Dataset builders that are defined adhoc (e.g. in Colab).
- Better `DatasetNotFoundError` messages.
- Don't set `deterministic` on a global level but locally in interleave, so it
only apply to interleave and not all transformations.
- Google drive downloader.

4.5.2

Added

- [API] `split=tfds.split_for_jax_process('train')` (alias of
`tfds.even_splits('train', n=jax.process_count())[jax.process_index()]`).
- [Documentation] update.

Fixed

- Import bug on Windows (3709).

4.5.0

Added

- [API] Better split API:
- Splits can be selected using shards: `split='train[3shard]'`.
- Underscore supported in numbers for better readability:
`split='train[:500_000]'`.
- Select the union of all splits with `split='all'`.
- [`tfds.even_splits`](https://www.tensorflow.org/datasets/splits#tfdseven_splits_multi-host_training)
is more precise and flexible:
- Return splits exactly of the same size when passed
`tfds.even_splits('train', n=3, drop_remainder=True)`.
- Works on subsplits `tfds.even_splits('train[:75%]', n=3)` or even
nested.
- Can be composed with other splits: `tfds.even_splits('train', n=3)[0] +
'test'`.
- [API] `serialize_example` / `deserialize_example` methods on features to
encode/decode example to proto: `example_bytes =
features.serialize_example(example_data)`.
- [API] `Audio` feature now supports `encoding='zlib'` for better compression.
- [API] Features specs are exposed in proto for better compatibility with
other languages.
- [API] Create beam pipeline using TFDS as input with
[tfds.beam.ReadFromTFDS](https://www.tensorflow.org/datasets/api_docs/python/tfds/beam/ReadFromTFDS).
- [API] Support setting the file formats in `tfds build
--file_format=tfrecord`.
- [API] Typing annotations exposed in `tfds.typing`.
- [API] `tfds.ReadConfig` has a new `assert_cardinality=False` argument to
disable cardinality.
- [API] `tfds.display_progress_bar(True)` for functional control.
- [API] DatasetInfo exposes `.release_notes`.
- Support for huge number of shards (>99999).
- [Performance] Faster dataset generation (using tfrecords).
- [Testing] Mock dataset now supports nested datasets
- [Testing] Customize the number of sub examples
- [Documentation] Community datasets:
https://www.tensorflow.org/datasets/community_catalog/overview.
- [Documentation]
[Guide on TFDS and determinism](https://www.tensorflow.org/datasets/determinism).
- [[RLDS](https://github.com/google-research/rlds)] Support for nested
datasets features.
- [[RLDS](https://github.com/google-research/rlds)] New datasets: Robomimic,
D4RL Ant Maze, RLU Real World RL, and RLU Atari with ordered episodes.
- New datasets.

Deprecated

- Python 3.6 support: this is the last version of TFDS supporting Python 3.6.
Future versions will use Python 3.7.

Fixed

- Misc bugs.

4.4.0

Added

- [API]
[`PartialDecoding` support](https://www.tensorflow.org/datasets/decode#only_decode_a_sub-set_of_the_features),
to decode only a subset of the features (for performances).
- [API] `tfds.features.LabeledImage` for semantic segmentation (like image but
with additional `info.features['image_label'].name` label metadata).
- [API] float32 support for `tfds.features.Image` (e.g. for depth map).
- [API] Loading datasets from files now supports custom
`tfds.features.FeatureConnector`.
- [API] All FeatureConnector can now have a `None` dimension anywhere
(previously restricted to the first position).
- [API] `tfds.features.Tensor()` can have arbitrary number of dynamic
dimension (`Tensor(..., shape=(None, None, 3, None)`)).
- [API] `tfds.features.Tensor` can now be serialised as bytes, instead of
float/int values (to allow better compression): `Tensor(...,
encoding='zlib')`.
- [API] Support for datasets with `None` in `tfds.as_numpy`.
- Script to add TFDS metadata files to existing TF-record (see
[doc](https://www.tensorflow.org/datasets/external_tfrecord)).
- [TESTING] `tfds.testing.mock_data` now supports:
- non-scalar tensors with dtype `tf.string`;
- `builder_from_files` and path-based community datasets.
- [Documentation] Catalog now exposes links to
[KnowYourData visualisations](https://knowyourdata-tfds.withgoogle.com/).
- [Documentation] Guide on
[common implementation gotchas](https://www.tensorflow.org/datasets/common_gotchas).
- Many new reinforcement learning datasets. Changed
- [API] Dataset generated with `disable_shuffling=True` are now read in
generation order.

Fixed

- File format automatically restored (for datasets generated with
`tfds.builder(..., file_format=)`).
- Dynamically set number of worker threads during extraction.
- Update progress bar during download even if downloads are cached.
- Misc bug fixes.

4.3.0

Added

- [API] `dataset.info.splits['train'].num_shards` to expose the number of
shards to the user.
- [API] `tfds.features.Dataset` to have a field containing sub-datasets (e.g.
used in RL datasets).
- [API] dtype and `tf.uint16` support in `tfds.features.Video`.
- [API] `DatasetInfo.license` field to add redistributing information.
- [API] `.copy`, `.format` methods to GPath objects.
- [Performances] `tfds.benchmark(ds)` (compatible with any iterator, not just
`tf.data`, better colab representation).
- [Performances] Faster `tfds.as_numpy()` (avoid extra `tf.Tensor` <>
`np.array` copy).
- [Testing] Support for custom `BuilderConfig` in `DatasetBuilderTest`.
- [Testing] `DatasetBuilderTest` now has a `dummy_data` class property which
can be used in `setUpClass`.
- [Testing] `add_tfds_id` and cardinality support to `tfds.testing.mock_data`.
- [Documentation] Better `tfds.as_dataframe` visualisation (Sequence, ragged
tensor, semantic masks with `use_colormap`).
- [Experimental] Community datasets support. To allow dynamically import
datasets defined outside the TFDS repository.
- [Experimental] Hugging-face compatibility wrapper to use Hugging-face
datasets directly in TFDS.
- [Experimental] Riegeli format support.
- [Experimental] `DatasetInfo.disable_shuffling` to force examples to be read
in generation order.
- New datasets.

Fixed

- Many bugs.

Page 3 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.