Datasets

Latest version: v3.1.0

Safety actively analyzes 681857 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 6

4.2.0

Added

- [CLI] `tfds build` to the CLI. See
[documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset).
- [API] `tfds.features.Dataset` to represent nested datasets.
- [API] `tfds.ReadConfig(add_tfds_id=True)` to add a unique id to the example
`ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`).
- [API] `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to
default `AUTOTUNE` option.
- [API] `tfds.ImageFolder` support for `tfds.decode.SkipDecoder`.
- [API] Multichannel audio support to `tfds.features.Audio`.
- [API] `try_gcs` to `tfds.builder(..., try_gcs=True)`
- Better `tfds.as_dataframe` visualization (ffmpeg video if installed,
bounding boxes,...).
- [TESTING] Allow `max_examples_per_splits=0` in `tfds build
--max_examples_per_splits=0` to test `_split_generators` only (without
`_generate_examples`).
- New datasets.

Changed

- [API] DownloadManager now returns
[Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use)
objects.
- [API] Simpler `BuilderConfig` definition: class `VERSION` and
`RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is
now optional.
- [API] To guarantee better deterministic, new validations are performed on
the keys when creating a dataset (to avoid filenames as keys
(non-deterministic) and restrict key to `str`, `bytes` and `int`). New
errors likely indicates an issue in the dataset implementation.
- [API] `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a
`dict`).
- [API] `tfds.units` is not visible anymore from the public API.
- Datasets updates.

Deprecated

Removed

- Configs for all text datasets. Only plain text version is kept. For example:
`multi_nli/plain_text` -> `multi_nli`.

Fixed

- [API] Datasets returned by `tfds.as_numpy` are compatible with `len(ds)`.
- Support 0-len sequence with images of dynamic shape (Fix 2616).
- Progression bar correctly updated when copying files.
- Better debugging and error message (e.g. human readable size,...).
- Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM
visual artifacts, GCS crash on windows, re-download when checksums updated,
...).

4.1.0

Added

- It is now easier to create datasets outside TFDS repository (see our updated
[dataset creation guide](https://www.tensorflow.org/datasets/add_dataset)).
- When generating a dataset, if download fails for any reason, it is now
possible to manually download the data. See
[doc](https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails).
- `tfds.core.as_path` to create pathlib.Path-like objects compatible with GCS
(e.g. `tfds.core.as_path('gs://my-bucket/labels.csv').read_text()`).
- `verify_ssl=` option to `tfds.download.DownloadConfig` to disable SSH
certificate during download.
- New datasets. Changed
- All dataset inherit from `tfds.core.GeneratorBasedBuilder`. Converting a
dataset to beam now only require changing `_generate_examples` (see
[example and doc](https://www.tensorflow.org/datasets/beam_datasets#instructions)).
- `_split_generators` should now returns `{'split_name':
self._generate_examples(), ...}` (but current datasets are backward
compatible).
- Better `pathlib.Path`, `os.PathLike` compatibility: `dl_manager.manual_dir`
now returns a pathlib-Like object. Example: `python text =
(dl_manager.manual_dir / 'downloaded-text.txt').read_text()` Note: Other
`dl_manager.download`, `.extract`,... will return pathlib-like objects in
future versions. `FeatureConnector`,... and most functions should accept
`PathLike` objects. Let us know if some functions you need are missing.
- `--record_checksums` now assume the new dataset-as-folder model.

Deprecated

- `tfds.core.SplitGenerator`, `tfds.core.BeamBasedBuilder` are deprecated and
will be removed in a future version.

Fixed

- `BuilderConfig` are now compatible with Beam datasets 2348
- `tfds.features.Images` can accept encoded `bytes` images directly (useful
when used with `img_name, img_bytes =
dl_manager.iter_archive('images.zip')`).
- Doc API now show deprecated methods, abstract methods to overwrite are now
documented.
- You can generate `imagenet2012` with only a single split (e.g. only the
validation data). Other split will be skipped if not present.

4.0.1

Fixed

- `tfds.load` when generation code isn't present.
- GCS compatibility.

4.0.0

Added

- Dataset-as-folder: Dataset can now be self-contained module in a folder with
checksums, dummy data,... This simplify implementing datasets outside the
TFDS repository.
- `tfds.load` can now load dataset without using the generation class. So
`tfds.load('my_dataset:1.0.0')` can work even if `MyDataset.VERSION ==
'2.0.0'` (See 2493).
- TFDS CLI (see https://www.tensorflow.org/datasets/cli for detail).
- `tfds.testing.mock_data` does not require metadata files anymore!
- `tfds.as_dataframe(ds, ds_info)` with custom visualisation
([example](https://www.tensorflow.org/datasets/overview#tfdsas_dataframe)).
- `tfds.even_splits` to generate subsplits (e.g. `tfds.even_splits('train',
n=3) == ['train[0%:33%]', 'train[33%:67%]', ...]`.
- `DatasetBuilder.RELEASE_NOTES` property.
- `tfds.features.Image` now supports PNG with 4-channels.
- `tfds.ImageFolder` now supports custom shape, dtype.
- Downloaded URLs are available through `MyDataset.url_infos`.
- `skip_prefetch` option to `tfds.ReadConfig`.
- `as_supervised=True` support for `tfds.show_examples`, `tfds.as_dataframe`.
- tfds.features can now be saved/loaded, you may have to overwrite
[FeatureConnector.from_json_content](https://www.tensorflow.org/datasets/api_docs/python/tfds/features/FeatureConnector?version=nightly#from_json_content)
and `FeatureConnector.to_json_content` to support this feature.
- Script to detect dead-urls.
- New datasets.

Changed

- `tfds.as_numpy()` now returns an iterable which can be iterated multiple
times. To migrate: `next(ds)` -> `next(iter(ds))`.
- Rename `tfds.features.text.Xyz` -> `tfds.deprecated.text.Xyz`.

Removed

- `DatasetBuilder.IN_DEVELOPMENT` property.
- `tfds.core.disallow_positional_args` (should use Py3 `*,` instead).
- Testing against TF 1.15. Requires Python 3.6.8+.

Fixed

- Better archive extension detection for `dl_manager.download_and_extract`.
- Fix `tfds.__version__` in TFDS nightly to be PEP440 compliant
- Fix crash when GCS not available.
- Improved open-source workflow, contributor guide, documentation.
- Many other internal cleanups, bugs, dead code removal, py2->py3 cleanup,
pytype annotations,...
- Datasets updates.

3.2.1

Fixed

- Issue with GCS on Windows.

3.2.0

Added

- [API] `tfds.ImageFolder` and `tfds.TranslateFolder` to easily create custom
datasets with your custom data.
- [API] `tfds.ReadConfig(input_context=)` to shard dataset, for better
multi-worker compatibility (1426).
- [API] The default `data_dir` can be controlled by the `TFDS_DATA_DIR`
environment variable.
- [API] Better usability when developing datasets outside TFDS: downloads are
always cached, checksums are optional.
- Scripts to help deployment/documentation (Generate catalog documentation,
export all metadata files, ...).
- [Documentation] Catalog display images
([example](https://www.tensorflow.org/datasets/catalog/sun397#sun397standard-part2-120k)).
- [Documentation] Catalog shows which dataset have been recently added and are
only available in `tfds-nightly`
<span class="material-icons">nights_stay</span>.
- [API] `tfds.show_statistics(ds_info)` to display
[FACETS OVERVIEW](https://pair-code.github.io/facets/). Note: This require
the dataset to have been generated with the statistics.

Deprecated

- `tfds.features.text` encoding API. Please use
[tensorflow_text](https://www.tensorflow.org/tutorials/tensorflow_text/intro)
instead.

Removed

- `tfds.load('image_label_folder')` in favor of the more user-friendly
`tfds.ImageFolder`.

Fixed

- Fix deterministic example order on Windows when path was used as key (this
only impacts a few datasets). Now example order should be the same on all
platforms.
- Misc performances improvements for both generation and reading (e.g. use
`__slot__`, fix parallelisation bug in `tf.data.TFRecordReader`, ...).
- Misc fixes (typo, types annotations, better error messages, fixing dead
links, better windows compatibility, ...).

Page 4 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.