- Enhanced the calculation of ground truth pass rate, and addressed the issue mentioned in https://github.com/bigcode-project/bigcodebench/pull/12#issuecomment-2199186199. - Update the README docs.
0.1.7
Fix some identified issues: - The ground truth pass rate was not previously computed in the correct way. - Passed RAM limits would raise errors, as they were set as float type. - User permission is not correctly set up in the Evaluate Docker.
Features: -- `check-gt-only` will print out the pass rate when finishing.
0.1.6
New features;
- The RAM setup is now adjustable via specific arguments. - Parallel ground truth checking is supported. Potentially failed checks are skipped during execution. A warning will be issued if the ground truth pass rate falls below 0.95.
0.1.5
New features;
- The data is downloaded from HF hub by default. - Data formats have been unified for the one on HF and the one on GitHub.