Major release - the SWE-bench evaluation harness has been upgraded to incorporate containerized, sandboxed execution environments based on Docker. There are several chances to the API resulting from this:
* Removal of the `swebench.metrics` module
* Updates to the API of `swebench.harness` functionality
* Significant modifications to underlying evaluation logic
* Minor updates to installation specifications for different repos + versions.
Read the full report [here](https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker)