Modelgauge

Latest version: v0.6.3

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

0.3.3

What's Changed
* Change SafeTest to data_april04 release.
* More prompts
* Removed safe-ben

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.3.2...v0.3.3

0.3.2

What's Changed
* `max_test_items` returns a relatively stable set of prompts
* Loading bar for plugins
* Have `list` command report prettier values for secrets
* Time out requests stuck on TogetherAI
* Updated docs
* Move `simple_test_runner` out of plugins and into core library

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.3.1...v0.3.2

0.3.1

What's Changed
* Fix bad version specification for `together` dependency, which was causing 0.3.0 to not actually install.
* Add Deepseek model that is now available on Together.
* Stabilize the order of TestItems in SafeTest to better utilize caching.

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.3.0...v0.3.1

0.3.0

What's Changed

* Reorganized the `run_data` folder and made several improvements to caching. **This breaks backward comparability**. Old files should just be ignored, but if you run into issues, probably best to just delete your `run_data` folder.
* Updated SafeTest to 02apr2024.
* We now have all SUTs in the [requested set](https://docs.google.com/document/d/11HsLhVFPsiwcwWIsou275u1HHbp8ZM8vkUCTjAcqLXE/edit), minus Deepseek.
* Simplified the command line to be `newhelm` once installed or `poetry run newhelm` when using the local repo.
* Annotations are now recorded per completion instead of per TestItem.
* HuggingFace sets pad token to default, which should remove warning messages.
* Added some enforcement of SUTCapabilities to help them be accurate.
* Remove all "Base" prefixes except BaseTest.

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.2.6...v0.3.0

0.2.6

What's Changed
* Bug fix for SafeTest

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.2.5...v0.2.6

0.2.5

What's Changed

* Tests no longer have a `get_metadata()` method. Dependency helper uses a Test's class name instead.
* Introduced the concept of SUT capabilities (`ProducesPerTokenLogProbabilities`, `AcceptsChatPrompt`, `AcceptsTextPrompt`). SUTs and Tests must specify their capabilities/requirements in the `newhelm_sut` and `newhelm_test` decorators.
* SUTs can now return per-token log probabilities in a `SUTCompletion`. OpenAIChat is updated with this capability.
* SafeTest updates:
* Re-structured to have one test per hazard, grouping all applicable persona types (typical, malicious, or vulnerable).
* Results are reported as mapping from persona type to PersonaResult, which consists of `num_items` in addition to `frac_safe`.
* Added tests for new hazards
* Added new test DiscrimEval

**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.2.4...v0.2.5

Page 2 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.