Bigcodebench

Latest version: v0.2.3.post5

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.2.3.post1

What's Changed
- Fix Docker image and its dependencies
- Support more models with reasoning effort
- Optional chat prefilling
- E2B, Gradio, and Local code execution

Evaluated LLMs (173 models)
- o3-mini
- DeepSeek R1

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.2.1.post7...v0.2.3.post1

0.2.1.post7

What's Changed
- Fix Docker image and its dependencies
- Fix o1 concurrent generation output collection
- Update the code sanitization

Evaluated LLMs (157 models)
- o1-2024-12-17
- Gemini-2.0 series

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.2.1.post3...v0.2.1.post7

0.2.1.post2

What's Changed
- Fix `calibration` setting in the code evaluation.
- Add `--no_execute` argument for code evaluation.
- Support concurrent API inference for `o1` and `deepseek-chat`.
- Fix API inference for Google Gemini.
- Add `--instruction_prefix` and `--response_prefix` arguments for code generation.
- Change `--id_range` input type.
- Add `--revision` arguments for code generation.

Evaluated LLMs (144 models)
- Qwen2.5-Coder-32B-Instruct
- grok-beta
- claude-3-5-haiku-20241022

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.2.0...v0.2.1.post2

0.2.0.post3

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.1.9...v0.2.0.post3

0.1.9

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.1.8...v0.1.9

0.1.8

Features:
- Support `BigCodeBench-Hard` subset: https://github.com/bigcode-project/bigcodebench/pull/17
- Identify and fix tokenizer setup: https://github.com/bigcode-project/bigcodebench/issues/21
- Customize the tokenizer: https://github.com/bigcode-project/bigcodebench/pull/20
- Add the pass rate result log: https://github.com/bigcode-project/bigcodebench/pull/20

Contributors:
- marianna13: https://github.com/bigcode-project/bigcodebench/pull/20

Models:
- A total of 96 models at the time of the release

Acknowledgement:
- ethanc8
- takkyu2
- imamnurby

**Full Changelog**: https://github.com/bigcode-project/bigcodebench/compare/v0.1.7...v0.1.8

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.