Pytorch-nlp

Latest version: v0.5.0

Safety actively analyzes 723882 Python packages for vulnerabilities to keep your Python projects secure.

0.4.0

Major updates

- Rewrote encoders to better support more generic encoders like a ``LabelEncoder``. Furthermore, added broad support for ``batch_encode``, ``batch_decode`` and ``enforce_reversible``.
- Rearchitected default reserved tokens to ensure configurability while still providing the convenience of good defaults.
- Added support to collate sequences with ``torch.utils.data.dataloader.DataLoader``. For example:
python3
from functools import partial
from torchnlp.utils import collate_tensors
from torchnlp.encoders.text import stack_and_pad_tensors

collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors)
torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)

- Added doctest support ensuring the documented examples are tested.
- Removed SRU support, it's too heavy of a module to support. Please use https://github.com/taolei87/sru instead. Happy to accept a PR with a better tested and documented SRU module!
- Update version requirements to support Python 3.6 and 3.7, dropping support for Python 3.5.
- Updated version requirements to support PyTorch 1.0+.
- Merged https://github.com/PetrochukM/PyTorch-NLP/pull/66 reducing the memory requirements for pre-trained word vectors by 2x.

Minor Updates

- Formatted the code base with YAPF.
- Fixed ``pandas`` and ``collections`` warnings.
- Added invariant assertion to ``Encoder`` via ``enforce_reversible``. For example:
Python3
encoder = Encoder().enforce_reversible()

Ensuring Encoder.decode(Encoder.encode(object)) == object
- Fixed the accuracy metric for PyTorch 1.0.

0.3.7.post1

Minor release fixing some issues and bugs.

0.3.0

Major Features And Improvements

- Upgraded to PyTorch 0.4.0
- Added Byte-Pair Encoding (BPE) pre-trained subword embeddings in 275 languages
- Refactored download scripts to ``torchnlp.downloads``
- Enable Spacy encoder to run in multiple languages.
- Added a boolean aligned option to FastText supporting MUSE (Multilingual Unsupervised and Supervised Embeddings)

Bug Fixes and Other Changes

- Create non-existent cache dirs for ``torchnlp.word_to_vector``.
- Add ``set`` operation to ``torchnlp.datasets.Dataset`` with support for slices, columns and rows
- Updated ``biggest_batches_first`` in ``torchnlp.samplers`` to be more efficient at approximating memory then Pickle
- Enabled ``torch.utils.pad_tensor`` and ``torch.utils. pad_batch`` to support N dimensional tensors
- Updated to sacremoses to fix NLTK moses dependancy for ``torch.text_encoders``
- Added ``__getitem()__`` for ``_PretrainedWordVectors``. For example:

from torchnlp.word_to_vector import FastText
vectors = FastText()
tokenized_sentence = ['this', 'is', 'a', 'sentence']
vectors[tokenized_sentence]

- Added ``__contains__`` for ``_PretrainedWordVectors``. For example:

>>> from torchnlp.word_to_vector import FastText
>>> vectors = FastText()

>>> 'the' in vectors
True
>>> 'theqwe' in vectors
False

0.2.0

Releases

Has known vulnerabilities