What's Changed * fix make_raw_chat_prompt when prefill is disabled by zhangchen-xu in https://github.com/bigcode-project/bigcodebench/pull/75 * Specify a unique cache directory before each code execution by shwinshaker in https://github.com/bigcode-project/bigcodebench/pull/77 * fix E2b execution debug by terryyz in https://github.com/bigcode-project/bigcodebench/pull/79 * fix e2b by terryyz in https://github.com/bigcode-project/bigcodebench/pull/80 * Add support for Hugging Face Serverless Inference by hvaara in https://github.com/bigcode-project/bigcodebench/pull/85 * Reintroduce progress checker from 48 by hvaara in https://github.com/bigcode-project/bigcodebench/pull/86 * Fixes for tasks 211 and 215 by hvaara in https://github.com/bigcode-project/bigcodebench/pull/49
New Contributors * zhangchen-xu made their first contribution in https://github.com/bigcode-project/bigcodebench/pull/75 * shwinshaker made their first contribution in https://github.com/bigcode-project/bigcodebench/pull/77
What's Changed - Fix Docker image and its dependencies - Support more models with reasoning effort - Optional chat prefilling - E2B, Gradio, and Local code execution
What's Changed - Fix `calibration` setting in the code evaluation. - Add `--no_execute` argument for code evaluation. - Support concurrent API inference for `o1` and `deepseek-chat`. - Fix API inference for Google Gemini. - Add `--instruction_prefix` and `--response_prefix` arguments for code generation. - Change `--id_range` input type. - Add `--revision` arguments for code generation.