Swebench

Latest version: v2.1.7

Safety actively analyzes 688634 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.6.9

> ⚠️ Do NOT use these versions. The PyPI package for these versions was under development. Specifically, some of the evaluation configurations required re-validation. A detailed report for the failures and our recovery from it are detailed in [Bug Report 4/5/2024](docs/reports/20240405_eval_bug/README.md).

0.6.1

* Added minor conditions to make `run_evaluation` more robust (e.g. exit on empty predictions)
* Added logic that conditions conda link download based on which architecture/platform (e.g. x86, arm) the code is being run on.
* Added classes to unify `subprocess` execution arguments + make them more consistent throughout the codebase. Also remove `shell=True` flag when not necessary.
* Added deterministic hashing of model name when creating certain testbed paths, defends against https://github.com/conda/conda/issues/12250
* Fixed key errors across the `metrics/` folder.
* Reorganized `harness` code. Moved constants into a separate file to improve readability.

0.4.8

* `run_evaluation` can be imported to make running the evaluation harness of SWE-bench more accessible.
* Add condition in `harness/context_manager.py` to skip installation if no instructions are provided.
* Add functionality to check and remove logs with `AttributeError` or `ImportError`
* Add support for HumanEval dataset.
* Add support for relative paths for `log_dir` and `testbed` arguments of evaluation.
* Minor renaming for `metrics/report.py` variables.

0.4.3

> ⚠️ Do NOT use these versions. The PyPI package was under development for these versions and will not work properly.

Page 3 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.