Added
- Several new datasets. Thanks to all the
[contributors](https://github.com/tensorflow/datasets/graphs/contributors)!
- Support for nested `tfds.features.Sequence` and `tf.RaggedTensor`
- Custom `FeatureConnector`s can override the `decode_batch_example` method
for efficient decoding when wrapped inside a
`tfds.features.Sequence(my_connector)`.
- Beam datasets can use a `tfds.core.BeamMetadataDict` to store additional
metadata computed as part of the Beam pipeline.
- Beam datasets' `_split_generators` accepts an additional `pipeline` kwargs
to define a pipeline shared between all splits.
Changed
- The default versions of all datasets are now using the S3 slicing API. See
the [guide](https://www.tensorflow.org/datasets/splits) for details.
- `shuffle_files` defaults to False so that dataset iteration is deterministic
by default. You can customize the reading pipeline, including shuffling and
interleaving, through the new `read_config` parameter in
[`tfds.load`](https://www.tensorflow.org/datasets/api_docs/python/tfds/load).
- `urls` kwargs renamed `homepage` in `DatasetInfo`
Deprecated
- Python2 support: this is the last version of TFDS that will support
Python 2. Going forward, we'll only support and test against Python 3.
- The previous split API is still available, but is deprecated. If you wrote
`DatasetBuilder`s outside the TFDS repository, please make sure they do not
use `experiments={tfds.core.Experiment.S3: False}`. This will be removed in
the next version, as well as the `num_shards` kwargs from `SplitGenerator`.
Fixed
- Various other bug fixes and performance improvements. Thank you for all the
reports and fixes!