Inspect-ai

Latest version: v0.3.82

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 13 of 14

0.3.10

- **BREAKING:** The `pattern` scorer has been modified to match against any (or all) regex match groups. This replaces the previous behaviour when there was more than one group, which would only match the second group.
- Improved performance for Inspect View on very large datasets (virtualized sample list).
- ToolChoice `any` option to indicate the model should use at least one tool (supported by Anthropic and Mistral, mapped to `auto` for OpenAI).
- Tool calls can now return a simple scalar or `list[ContentText | ContentImage]`.
- Support for updated Anthropic tools beta (tool_choice and image tool results).
- Report tool_error back to model if it provides invalid JSON for tool calls arguments (formerly this halted the entire eval with an error).
- New `max_samples` option to control how many samples are run in parallel (still defaults to running all samples in parallel).
- Add `boolq.py` benchmark.
- Add `piqa.py` benchmark.
- View: Improved markdown rendering (properly escape reference links).
- Improved typing for example_dataset function.
- Setuptools entry point for loading custom model extensions.
- Break optional `tuple` return out of `ToolResult` type.
- Bugfix: always read original sample message(s) for `TaskState.input_text`.
- Bugfix: remove write counter from log (could have resulted in incomplete/invalid logs propagating to the viewer).
- Bugfix: handle task names that include spaces in log viewer.

0.3.9

- Add `ollama` local model provider.
- Add `multi_scorer()` and `majority_vote()` functions for combining multiple scorers into a single score.
- Add support for multiple model graders in `model_graded_qa()`.
- Raise `TypeError` for solvers and scorers not declared as `async`.
- Fallback to standard parse if `NaN` or `Inf` is encountered while reading log file header.
- Remove deprecated support for matching partial model names (e.g. "gpt" or "claude").

0.3.8

- Exclude null config values from listings in log viewer.

0.3.7

- Add support for logprobs to HF provider, and create uniform API for other providers that support logprobs (Together and OpenAI).
- Provide an option to merge assistant messages and use it for Anthropoic models (as they don't allow consecutive assistant messages).
- Supporting infrastructure in Inspect CLI for VS Code extension (additional list and info commands).

0.3.6

- Show first log file immediately (don't wait for fetching metadata for other logs)
- Add `--version` CLI arg and `inspect info version` command for interrogating version and runtime source path.
- Fix: exclude `null` config values in output from `inspect info log-file`

0.3.5

- Fix issue with logs from S3 buckets in inspect view.
- Add `sort()` method to `Dataset` (defaults to sorting by sample input length).
- Improve tokenization for HF provider (left padding, attention mask, and allow for custom chat template)
- Improve batching for HF provider (generate as soon as queue fills, thread safety for future.set_result).
- Various improvements to documentation.

Page 13 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.