Torchtext

Latest version: v0.18.0

Safety actively analyzes 681844 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 6

0.2.1

This is a minor release; we have not included any breaking API changes but there are some new features that don't break existing APIs.

We have always intended to support lazy datasets (specifically, those implemented as Python generators) but this version includes a bugfix that makes that support more useful. See a demo of it in action [here](https://drive.google.com/file/d/11yqYCQJiwIevZguXl7x9Oo6dpl7keRpy/view?usp=sharing).

Datasets:

- Added support for sequence tagging (e.g., NER/POS/chunking) datasets and wrapped the Universal Dependencies POS-tagged corpus (157, thanks sivareddyg!)

Features:

- Added `pad_first` keyword argument to `Field` constructors, allowing left-padding in addition to right-padding (161, thanks GregorySenay!)
- Support loading word vectors from local folder (168, thanks ahhegazy!)
- Support using `list` (character tokenization) in `ReversibleField` (188)
- Added hooks for Sphinx/RTD documentation (179, thanks keon and EntilZha, whose preliminary version is available at torch-text.readthedocs.io)
- Added support for `torchtext.__version__` (179, thanks keon!)

Bugfixes:

- Fixed deprecated word vector usage in WT2 dataset (166, thanks keon!)
- Fixed bug in word vector loading (168, thanks ahhegazy!)
- Fixed bug in word vector aliases (191, thanks ryanleary!)
- Fixed side effects of building a vocabulary (193 + 181, thanks donglixp!)
- Fixed arithmetic mistake in language modeling dataset length calculation (182, thanks jihunchoi!)
- Avoid materializing an otherwise-lazy dataset when using `filter_pred` (194)
- Fixed bug in raw float fields (159)
- Avoid providing a misleading `len` when using `batch_size_fn` (192)

0.2.0

Breaking changes:
- By default, examples are now sorted within a batch by decreasing sequence length (95, 139). This is required for use of PyTorch `PackedSequence`s, and it can be flexibly overridden with a `Dataset` constructor flag.
- The unknown token is now included as part of `specials` and can be overridden or removed in the `Field` constructor (part of 107).

New features:
- New word vector API with classes for GloVe and FastText; string descriptors are still accepted for backwards compatibility (94, 102, 115, 120, thanks nelson-liu and bmccann!)
- Reversible tokenization (107). Introduces a new `Field` subclass, `ReversibleField`, with a `.reverse` method that detokenizes. All implementations of `ReversibleField` should guarantee that the tokenization+detokenization round-trip is idempotent; torchtext provides wrappers for the [revtok](https://github.com/jekbradbury/revtok) tokenizer and subword segmenter that satisfy this property.
- Skip header line in CSV/TSV loading (146)
- `RawField`s that represent any data type without processing (147, thanks kylegao91!)

New datasets:
- TREC (92, thanks bmccann!)
- IMDb (93, thanks bmccann!)
- Multi30k (116, thanks bmccann!)
- IWSLT (126, 128, thanks bmccann!)
- WMT14 (138)

Bugfixes:
- Fix pretrained word vector loading (99, thanks matt-peters!)
- Fix JSON loader silently ignoring requested columns not present in the file (105, thanks nelson-liu!)
- Many fixes for Python 2, especially surrounding Unicode (105, 112, 135, 153 thanks nelson-liu!)
- Fix `Pipeline.call` behavior (113, thanks nelson-liu!)
- Fix README example (134, thanks czhang99!)
- Fix WikiText2 loader (138)
- Fix typo in MT loader (142, thanks sivareddyg!)
- Fix `Example.fromlist` behavior on non-strings (145)
- Update test set URL for Multi30k (149)
- Fix SNLI data loader (150, thanks sivareddyg!)
- Fix language modeling iterator (151)
- Remove transpose as a side effect of `Field.reverse` (155)

0.1.1

So that we can develop v0.2 on master, with refactored and extended word vectors (minimally breaking) and revtok support (reversible tokenizer with optional wordpieces; major feature but shouldn't break API).

Page 6 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.