Inspect-ai

Latest version: v0.3.82

Safety actively analyzes 724327 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 14

0.3.64

- [Reference documentation](https://inspect.aisi.org.uk/reference/) for Python API and CLI commands.
- Add support for [clustered standard errors](https://inspect.aisi.org.uk/scorers.html#clustered-standard-errors) via a new `cluster` parameter for the `stderr()` metric.
- Improvements to [scoring workflow](https://inspect.aisi.org.uk/scorers.html#sec-scorer-workflow) (`inspect score` command and `score()` function).
- Metrics now take `list[SampleScore]` rather than `list[Score]` (previous signature is deprecated but still works with a warning).
- Use a sample adjustment for the `var()` metric.
- Google: Speculative fix for completion candidates not being returned as a list.
- Python and Bash tools: Add `sandbox` argument for running in non-default sandboxes.
- Transcript: Log `ScoreEvent` (with `intermediate=True`) when the `score()` function is called.
- Transcript: Add `source` field to `InfoEvent` and use it for events logged by the human agent.
- Docker: Support Dockerfiles with `.Dockerfile` extension.
- Docker: Raise error when there is an explicitly configured `container_name` (incompatible with epochs > 1).
- Docker: Dynamically set `compose up` timeout when there are `healthcheck` entries for services.
- Log: Validate that `log_dir` is writeable at startup.
- Log: Write eval config defaults into log file (rather than `None`).
- Bugfix: Always honor level-level-transcript setting for transcript logging.
- Bugfix: Fix some dynamic layout issues for sample sandbox view.

0.3.63

- Add [OpenRouter](https://inspect.aisi.org.uk/providers.html#openrouter) model provider.
- Inspect View: Convert codebase from JS/Preact to Typescript/React
- Add `shuffle_choices` to dataset and dataset loading funtions. Deprecate `shuffle` parameter to the `multiple_choice` solver.
- Add `stop_words` param to the `f1` scorer. `stop_words` will be removed from the target and answer during normalization.
- Tools: Handle return of empty list from tool calls.
- Computer: Moved out of beta (i.e. from `inspect_ai.tool.beta` into `inspect_ai.tool`).
- Sandboxes: Docker now uses `tee` for write_file operations.
- Inspect View: Handle Zip64 zip files (for log files greater than 4GB)
- Bugfix: Change `type` parameter of `answer()` to `pattern` to address registry serialisation error.
- Bugfix: Restore printing of request payloads for 400 errors from Anthropic.
- Bugfix: Log transcript event for solver provided scores (improves log viewer display of solver scoring)

0.3.62

- Various improvements for [reasoning models](https://github.com/UKGovernmentBEIS/inspect_ai/pull/1229) including extracting reasoning content from assistant messages.
- OpenAI: Handle `reasoning_effort`, `max_tokens`, `temperature`, and `parallel_tool_calls` correctly for o3 models.
- OpenAI: Map some additional 400 status codes to `content_filter` stop reason.
- Anthropic: Handle 413 status code (Payload Too Large) and map to `model_length` StopReason.
- Tasks: Log sample with error prior to raising task-ending exception.
- Python: Enhance prompt to emphasise that it is a script rather than a notebook.
- Computer: Various improvements to image including desktop, python, and VS Code configuration.
- Bugfix: Don't download full log from S3 for header_only reads.

0.3.61

- Computer: Enable viewing computer tool's remote mouse cursor via VNC.
- Computer: Disable lock screen on from computer tool reference image.
- Limits: Amend `SampleLimitExceededError` with current `state` so that messages, etc. are preserved when limits are hit.
- Tools: Properly handle image dispatching when multiple tool calls are made by assistant.
- Anthropic: Raise error on 400 status not identified as model_length or content_filter.
- Basic Agent: `incorrect_message` can now optionally be an async function.
- Bugfix: Remove `suffix` from `eval-set` CLI args.
- Bugfix: Only catch `Exception` from sandboxenv_init (allow cancelled to propagate)

0.3.60

- [Agent Bridge](https://inspect.aisi.org.uk/agent-bridge.html) for integrating external agent frameworks with Inspect.
- [Goodfire](https://inspect.aisi.org.uk/models.html#goodfire) model provider.
- Add `wraps` to functions wrapped by Inspect decorators to preserve type information.
- Hugging Face: Add support for stop sequences for HF models.
- Docker: More robust parsing of version strings (handle development versions).
- Vertex: Support for Anthropic models hosted on Vertex.
- OpenAI: Read `refusal` field from assistant message when provided.
- OpenAI: Use qualifiers rather than model args for OpenAI on other providers (`openai/azure`)
- Anthropic: Don't insert '(no content)' into cannonical messages list (do only on replay)
- Anthropic: Use qualifiers rather than model args for Anthropic on other providers (`anthropic/bedrock`, `anthropic/vertex`).
- Anthropic: Suport for `extra_body` model arg (for adding additional JSON properties to the request)
- Basic Agent: Append `tools` to `state` so that tools added in `init` are preserved.
- Scoring: Always provide half-again the sample time limit for scoring.
- Bugfix: Fix issue w/ approvals for samples with id==0.
- Bugfix: Use "plain" display when running eval_async() outside of eval().
- Bugfix: Fix issue with multiple scorers of the same type in a task.

0.3.59

- Beta version of [computer()](https://inspect.aisi.org.uk/tools-standard.html#sec-computer) tool which models with a computer desktop environment.
- `user_message()` solver for appending parameterised user messages.
- `prompt_template()`, `system_message()` and `user_message()` solver now also include the sample `store` in substitution parameters.
- Limits: Enforce token and message limit at lower level (not longer required to check `state.completed` for limit enforcement).
- Limits: Enforce [custom limits](https://inspect.aisi.org.uk/errors-and-limits.html#custom-limit) for samples by raising `SampleLimitExceededError`.
- Tasks: Optional ability for solvers to [yield scores](https://inspect.aisi.org.uk/solvers.html#sec-scoring-in-solvers) for a task.
- Model API: Log model calls that result in bad request errors.
- Tools: `model_input` option that determines how tool call result content is played back to the model.
- Tools: Don't attempt to marshall arguments of dynamic `ToolDef` with `**kwargs: Any` (just pass them through).
- Log warning when a non-fatal sample error occurs (i.e. errors permitted by the `fail_on_error` option)
- Inspect View: allow filtering samples by compound expressions including multiple scorers. (thanks andrei-apollo)
- Inspect View: improve rendering performance and stability for the viewer when viewing very large eval logs or samples with a large number of steps.
- Task display: Improved `plain` mode with periodic updates on progress, metrics, etc.
- Google: Update to v0.8.4 of google-generativeai (py.typed support and removal of logprobs generation options)
- Google: Support for string enums (e.g. `Literal["a", "b", "c"])`) in tool function declarations.

Page 4 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.