What's Changed
* Reorganized the `run_data` folder and made several improvements to caching. **This breaks backward comparability**. Old files should just be ignored, but if you run into issues, probably best to just delete your `run_data` folder.
* Updated SafeTest to 02apr2024.
* We now have all SUTs in the [requested set](https://docs.google.com/document/d/11HsLhVFPsiwcwWIsou275u1HHbp8ZM8vkUCTjAcqLXE/edit), minus Deepseek.
* Simplified the command line to be `newhelm` once installed or `poetry run newhelm` when using the local repo.
* Annotations are now recorded per completion instead of per TestItem.
* HuggingFace sets pad token to default, which should remove warning messages.
* Added some enforcement of SUTCapabilities to help them be accurate.
* Remove all "Base" prefixes except BaseTest.
**Full Changelog**: https://github.com/mlcommons/newhelm/compare/v0.2.6...v0.3.0