Scandeval

Latest version: v13.0.0

Safety actively analyzes 666166 Python packages for vulnerabilities to keep your Python projects secure.

Page 21 of 21

0.4.0

Not secure

Added
- Added confidence intervals for finetuned models, where there is a 95%
likelihood that the true score would belong to the interval, given infinite
data from the same distribution. In the case of "raw" pretrained models, this
radius is added onto the existing interval, so that both the uncertainty in
model initialisation as well as sample size of the validation dataset affects
the size of the interval.
- Added garbage collection after each benchmark, which will (hopefully) prevent
memory leaking when benchmarking several models.

Changed
- New logo, including the Faroe Islands!
- Allow the possibility to include all languages and/or tasks in the CLI and
the `Benchmark` class.
- Added Icelandic and Faroese to default list of languages in CLI and the
`Benchmark` class.
- The default value for `task` is now all tasks, which also includes models
that haven't been assigned any task on the HuggingFace Hub;
- If a model cannot be trained without running out of CUDA memory, even with a
batch size of 1, then the model will be skipped in `Benchmark` and the CLI.

Fixed
- New model is initialised if CUDA runs out of memory, to ensure that we are
now continuing to train the previous model.
- Dependency parsing now implemented properly as two-label classification, with
associated UAS and LAS metric computations. Works for pretrained SpaCy models
as well as finetuning general language models.

0.3.1

Not secure

Fixed
- Reduces batch size if CUDA runs out of memory during evaluation.
- Loading of text classification datasets now working properly.

0.3.0

Not secure

Changed
- The `W036` warning message from SpaCy is no longer shown.

Fixed
- Raise `InvalidBenchmark` if model cannot be loaded from the HuggingFace Hub.

0.2.0

Not secure

Added
- Added the part-of-speech tagging task from the Danish Dependency Treebank.
Can be loaded with `load_ddt_pos` and used in `Benchmark` as `ddt-pos`.
- Added the dependency parsing task from the Danish Dependency Treebank.
Can be loaded with `load_ddt_ddt` and used in `Benchmark` as `ddt-dep`.
- Documentation section and link to `README`
- The `Benchmark` class and the CLI now accepts a `batch_size` argument

Changed
- `Benchmark` arguments `languages`, `tasks`, `model_ids` and `datasets` have
been renamed to `language`, `task`, `model_id` and `dataset`, to keep it
consistent with the CLI.
- When loading datasets, these will now be four dictionaries instead of lists,
to allow for distinguishing features and labels.
- `batch_size` arguments can now only be among 1, 2, 4, 8, 16 and 32, and the
corresponding gradient accumulation will be set to 32, 16, 8, 4, 2 and 1,
respectively. This is to ensure that all finetuning is done using the same
effective batch size, to ensure fair comparisons.
- Batch sizes are automatically halved if the GPU runs out of memory, with
gradient accumulation correspondingly doubles.
- Evaluation of `SpaCy` models on token classification tasks are more accurate.

Fixed
- `README` typos fixed, and image renders correctly

0.1.0

Not secure

Added
- First beta release
- Features Danish sentiment, hate speech detection and named entity
recognition datasets for benchmarking

Page 21 of 21

Releases

Has known vulnerabilities

Scandeval

Page 21 of 21

0.4.0

0.3.1

0.3.0

0.2.0

0.1.0

Page 21 of 21

Links

Releases