The performances of the contemporary models in this release are improved, most notably for models
not using BERT.
Added
- The `scripts/zenodo_upload.py` script, a helper for uploading files to a Zenodo deposit.
Changed
- The CharRNN lexer now represent words with last hidden (instead of cell) state of the LSTM and do
not run on padding anymore.
- Minimal Pytorch version is now `1.9.0`
- Minimal Transformers version is now `4.19.0`
- Use `torch.inference_mode` instead of `toch.no_grad` over all the parser methods.
- BERT lexer batches no longer have an obsolete, always zero `word_indices` attribute
- `DependencyDataset` does not have lexicon attributes (`ito(lab|tag` and their inverse) since we
don't need these anymore.
- The `train_model` script now skips incomplete runs with a warning.
- The `train_model` script has nicer logging, including progress bars to help keep track of the
experiments.
Fixed
- The first word in the word embeddings lexer vocabulary is not used as padding anymore and has a
real embedding.
- BERT embeddings are now correctly computed with an attention mask to ignore padding.
- The root token embedding coming from BERT lexers is now an average of non-padding words'
embeddings
- FastText embeddings are now computed by averaging over non-padding subwords' embeddings.
- In server mode, models are now correctly in eval mode and processing is done
in `torch.inference_mode`.