Added
- Added the part-of-speech tagging task from the Danish Dependency Treebank.
Can be loaded with `load_ddt_pos` and used in `Benchmark` as `ddt-pos`.
- Added the dependency parsing task from the Danish Dependency Treebank.
Can be loaded with `load_ddt_ddt` and used in `Benchmark` as `ddt-dep`.
- Documentation section and link to `README`
- The `Benchmark` class and the CLI now accepts a `batch_size` argument
Changed
- `Benchmark` arguments `languages`, `tasks`, `model_ids` and `datasets` have
been renamed to `language`, `task`, `model_id` and `dataset`, to keep it
consistent with the CLI.
- When loading datasets, these will now be four dictionaries instead of lists,
to allow for distinguishing features and labels.
- `batch_size` arguments can now only be among 1, 2, 4, 8, 16 and 32, and the
corresponding gradient accumulation will be set to 32, 16, 8, 4, 2 and 1,
respectively. This is to ensure that all finetuning is done using the same
effective batch size, to ensure fair comparisons.
- Batch sizes are automatically halved if the GPU runs out of memory, with
gradient accumulation correspondingly doubles.
- Evaluation of `SpaCy` models on token classification tasks are more accurate.
Fixed
- `README` typos fixed, and image renders correctly