Breaking changes:
- By default, examples are now sorted within a batch by decreasing sequence length (95, 139). This is required for use of PyTorch `PackedSequence`s, and it can be flexibly overridden with a `Dataset` constructor flag.
- The unknown token is now included as part of `specials` and can be overridden or removed in the `Field` constructor (part of 107).
New features:
- New word vector API with classes for GloVe and FastText; string descriptors are still accepted for backwards compatibility (94, 102, 115, 120, thanks nelson-liu and bmccann!)
- Reversible tokenization (107). Introduces a new `Field` subclass, `ReversibleField`, with a `.reverse` method that detokenizes. All implementations of `ReversibleField` should guarantee that the tokenization+detokenization round-trip is idempotent; torchtext provides wrappers for the [revtok](https://github.com/jekbradbury/revtok) tokenizer and subword segmenter that satisfy this property.
- Skip header line in CSV/TSV loading (146)
- `RawField`s that represent any data type without processing (147, thanks kylegao91!)
New datasets:
- TREC (92, thanks bmccann!)
- IMDb (93, thanks bmccann!)
- Multi30k (116, thanks bmccann!)
- IWSLT (126, 128, thanks bmccann!)
- WMT14 (138)
Bugfixes:
- Fix pretrained word vector loading (99, thanks matt-peters!)
- Fix JSON loader silently ignoring requested columns not present in the file (105, thanks nelson-liu!)
- Many fixes for Python 2, especially surrounding Unicode (105, 112, 135, 153 thanks nelson-liu!)
- Fix `Pipeline.call` behavior (113, thanks nelson-liu!)
- Fix README example (134, thanks czhang99!)
- Fix WikiText2 loader (138)
- Fix typo in MT loader (142, thanks sivareddyg!)
- Fix `Example.fromlist` behavior on non-strings (145)
- Update test set URL for Multi30k (149)
- Fix SNLI data loader (150, thanks sivareddyg!)
- Fix language modeling iterator (151)
- Remove transpose as a side effect of `Field.reverse` (155)