Added
- [API] Added
[TfDataBuilder](https://www.tensorflow.org/datasets/format_specific_dataset_builders#datasets_based_on_tfdatadataset)
that is handy for storing experimental ad hoc TFDS datasets in notebook-like
environments such that they can be versioned, described, and easily shared
with teammates.
- [API] Added options to create format-specific dataset builders. The new API
now includes a number of NLP-specific builders, such as:
- [CoNNL](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conll)
- [CoNNL-U](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conllu)
- [API] Added `tfds.beam.inc_counter` to reduce `beam.metrics.Metrics.counter`
boilerplate
- [API] Added options to group together existing TFDS datasets into
[dataset collections](https://www.tensorflow.org/datasets/dataset_collections)
and to perform simple operations over them.
- [Documentation] update, specifically:
- [New guide](https://www.tensorflow.org/datasets/format_specific_dataset_builders)
on format-specific dataset builders;
- [New guide](https://www.tensorflow.org/datasets/add_dataset_collection)
on adding new dataset collections to TFDS;
- Updated [TFDS CLI](https://www.tensorflow.org/datasets/cli)
documentation.
- [TFDS CLI] Supports custom config through Json (e.g. `tfds build my_dataset
--config='{"name": "my_custom_config", "description": "Abc"}'`)
- New datasets:
- [conll2003](https://www.tensorflow.org/datasets/catalog/conll2003)
- [universal_dependency 2.10](https://www.tensorflow.org/datasets/catalog/universal_dependency)
- [bucc](https://www.tensorflow.org/datasets/catalog/bucc)
- [i_naturalist2021](https://www.tensorflow.org/datasets/catalog/i_naturalist2021)
- [mtnt](https://www.tensorflow.org/datasets/catalog/mtnt) Machine
Translation of Noisy Text.
- [placesfull](https://www.tensorflow.org/datasets/catalog/placesfull)
- [tatoeba](https://www.tensorflow.org/datasets/catalog/tatoeba)
- [user_libri_audio](https://www.tensorflow.org/datasets/catalog/user_libri_audio)
- [user_libri_text](https://www.tensorflow.org/datasets/catalog/user_libri_text)
- [xtreme_pos](https://www.tensorflow.org/datasets/catalog/xtreme_pos)
- [yahoo_ltrc](https://www.tensorflow.org/datasets/catalog/yahoo_ltrc)
- Updated datasets:
- [C4](https://www.tensorflow.org/datasets/catalog/c4) was updated to
version 3.1.
- [common_voice](https://www.tensorflow.org/datasets/catalog/common_voice)
was updated to a more recent snapshot.
- [wikipedia](https://www.tensorflow.org/datasets/catalog/wikipedia) was
updated with the `20220620` snapshot.
- New dataset collections, such as
[xtreme](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/xtreme/xtreme.py)
and
[LongT5](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/longt5/longt5.py)
Changed
- The base `Logger` class expects more information to be passed to the
`as_dataset` method. This should only be relevant to people who have
implemented and registered custom `Logger` class(es).
- You can set `DEFAULT_BUILDER_CONFIG_NAME` in a `DatasetBuilder` to change
the default config if it shouldn't be the first builder config defined in
`BUILDER_CONFIGS`.
Deprecated
Removed
Fixed
- Various datasets
- In Linux, when loading a dataset from a directory that is not your home
(`~`) directory, a new `~` directory is not created in the current directory
(fixes [4117](https://github.com/tensorflow/datasets/issues/4117)).
Security