datasets Changelog

3.1.0

Added

- [API] `tfds.builder_cls(name)` to access a DatasetBuilder class by name
- [API] `info.split['train'].filenames` for access to the tf-record files.
- [API] `tfds.core.add_data_dir` to register an additional data dir.
- [Testing] Support for custom decoders in `tfds.testing.mock_data`.
- [Documentation] Shows which datasets are only present in `tfds-nightly`.
- [Documentation] Display images for supported datasets.

Changed

- Rename `tfds.core.NamedSplit`, `tfds.core.SplitBase` -> `tfds.Split`. Now
`tfds.Split.TRAIN`,... are instance of `tfds.Split`.
- Rename `interleave_parallel_reads` -> `interleave_cycle_length` for
`tfds.ReadConfig`.
- Invert ds, ds_info argument orders for `tfds.show_examples`.

Deprecated

- `tfds.features.text` encoding API. Please use `tensorflow_text` instead.

Removed

- `num_shards` argument from `tfds.core.SplitGenerator`. This argument was
ignored as shards are automatically computed.
- Most `ds.with_options` which where applied by TFDS. Now use `tf.data`
default.

Fixed

- Better error messages.
- Windows compatibility.

3.0.0

Added

- `DownloadManager` is now pickable (can be used inside Beam pipelines).
- `tfds.features.Audio`:
- Support float as returned value.
- Expose sample_rate through `info.features['audio'].sample_rate`.
- Support for encoding audio features from file objects.
- More datasets.

Changed

- New `image_classification` section. Some datasets have been move there from
`images`.
- `DownloadConfig` does not append the dataset name anymore (manual data
should be in `<manual_dir>/` instead of `<manual_dir>/<dataset_name>/`).
- Tests now check that all `dl_manager.download` urls has registered
checksums. To opt-out, add `SKIP_CHECKSUMS = True` to your
`DatasetBuilderTestCase`.
- `tfds.load` now always returns `tf.compat.v2.Dataset`. If you're using still
using `tf.compat.v1`:
- Use `tf.compat.v1.data.make_one_shot_iterator(ds)` rather than
`ds.make_one_shot_iterator()`.
- Use `isinstance(ds, tf.compat.v2.Dataset)` instead of `isinstance(ds,
tf.data.Dataset)`.

Deprecated

- The `tfds.features.text` encoding API is deprecated. Please use
[tensorflow_text](https://www.tensorflow.org/tutorials/tensorflow_text/intro)
instead.
- `num_shards` argument of `tfds.core.SplitGenerator` is currently ignored and
will be removed in the next version.

Removed

- Legacy mode `tfds.experiment.S3` has been removed
- `in_memory` argument has been removed from `as_dataset`/`tfds.load` (small
datasets are now auto-cached).
- `tfds.Split.ALL`.

Fixed

- Various bugs, better error messages, documentation improvements.

2.1.0

Not secure

Added

- Datasets expose `info.dataset_size` and `info.download_size`.
- [Auto-caching small datasets](https://www.tensorflow.org/datasets/performances#auto-caching).
- Datasets expose their cardinality `num_examples =
tf.data.experimental.cardinality(ds)` (Requires tf-nightly or TF >= 2.2.0)
- Get the number of example in a sub-splits with:
`info.splits['train[70%:]'].num_examples`

Changes

- All datasets generated with 2.1.0 cannot be loaded with previous version
(previous datasets can be read with `2.1.0` however).

Deprecated

- `in_memory` argument is deprecated and will be removed in a future version.

2.0.0

Not secure

Added

- Several new datasets. Thanks to all the
[contributors](https://github.com/tensorflow/datasets/graphs/contributors)!
- Support for nested `tfds.features.Sequence` and `tf.RaggedTensor`
- Custom `FeatureConnector`s can override the `decode_batch_example` method
for efficient decoding when wrapped inside a
`tfds.features.Sequence(my_connector)`.
- Beam datasets can use a `tfds.core.BeamMetadataDict` to store additional
metadata computed as part of the Beam pipeline.
- Beam datasets' `_split_generators` accepts an additional `pipeline` kwargs
to define a pipeline shared between all splits.

Changed

- The default versions of all datasets are now using the S3 slicing API. See
the [guide](https://www.tensorflow.org/datasets/splits) for details.
- `shuffle_files` defaults to False so that dataset iteration is deterministic
by default. You can customize the reading pipeline, including shuffling and
interleaving, through the new `read_config` parameter in
[`tfds.load`](https://www.tensorflow.org/datasets/api_docs/python/tfds/load).
- `urls` kwargs renamed `homepage` in `DatasetInfo`

Deprecated

- Python2 support: this is the last version of TFDS that will support
Python 2. Going forward, we'll only support and test against Python 3.
- The previous split API is still available, but is deprecated. If you wrote
`DatasetBuilder`s outside the TFDS repository, please make sure they do not
use `experiments={tfds.core.Experiment.S3: False}`. This will be removed in
the next version, as well as the `num_shards` kwargs from `SplitGenerator`.

Fixed

- Various other bug fixes and performance improvements. Thank you for all the
reports and fixes!

1.3.0

Not secure

Fixed

- Misc bugs and performance improvements.

1.2.0

Not secure

Added

Features

- Add `shuffle_files` argument to `tfds.load` function. The semantic is the
same as in `builder.as_dataset` function, which for now means that by
default, files will be shuffled for `TRAIN` split, and not for other splits.
Default behaviour will change to always be False at next major release.
- Most datasets now support the new S3 API
([documentation](https://github.com/tensorflow/datasets/blob/master/docs/splits.md#two-apis-s3-and-legacy)).
- Support for uint16 PNG images.

Datasets

- AFLW2000-3D
- Amazon_US_Reviews
- binarized_mnist
- BinaryAlphaDigits
- Caltech Birds 2010
- Coil100
- DeepWeeds
- Food101
- MIT Scene Parse 150
- RockYou leaked password
- Stanford Dogs
- Stanford Online Products
- Visual Domain Decathlon

Fixed

- Crash while shuffling on Windows
- Various documentation improvements

Datasets

Page 5 of 6

3.1.0

3.0.0

2.1.0

2.0.0

1.3.0

1.2.0

Page 5 of 6

Links

Releases