What's Changed - Fix Docker image and its dependencies - Support more models with reasoning effort - Optional chat prefilling - E2B, Gradio, and Local code execution
What's Changed - Fix `calibration` setting in the code evaluation. - Add `--no_execute` argument for code evaluation. - Support concurrent API inference for `o1` and `deepseek-chat`. - Fix API inference for Google Gemini. - Add `--instruction_prefix` and `--response_prefix` arguments for code generation. - Change `--id_range` input type. - Add `--revision` arguments for code generation.