Inspect-ai

Latest version: v0.3.82

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 14

0.3.34

- Support for `max_tokens` on OpenAI o1 models (map to `max_completion_tokens`).
- Fix regression of log and debug options on `inspect view`
- Improved focus management for Inspect View
- Raise error if `epochs` is less than 1
- Improve code parsing for HumanEval (compatibility with Llama model output)

0.3.33

- StopReason: Added "model_length" for exceeding token window and renamed "length" to "max_tokens".
- Capture solver input params for subtasks created by `fork()`.
- Option to disable ANSI terminal output with `--no-ansi` or `INSPECT_NO_ANSI`
- Add chain of thought option to `multiple_choice()` and export `MultipleChoiceTemplate` enumeration
- Allow Docker sandboxes configured with `x-default` to be referred to by their declared service name.
- Improved error messages for Docker sandbox initialisation.
- Improve legibility of Docker sandbox log entries (join rather than displaying as array)
- Display user message immediately proceeding assistant message in model call transcripts.
- Display images created by tool calls in the Viewer.
- Fix duplicated tool call output display in Viewer for Gemini and Llama models.
- Require a `max_messages` for use of `basic_agent()` (as without it, the agent could end up in an infinite loop).
- Load extension entrypoints per-package (prevent unnecessary imports from packages not being referenced).
- Track sample task state in solver decorator rather than solver transcript.
- Display solver input parameters for forked subtasks.
- Improvements to docker compose down cleanup: timeout, survive missing compose files.
- Always produce epoch sample reductions even when there is only a single epoch.
- Scores produced after being reduced retain `answer`, `explanation`, and `metadata` only if equal across all epochs.

0.3.32

- Fix issue w/ subtasks not getting a fresh store() (regression from introduction of `fork()` in v0.3.30)
- Fix issue w/ subtasks that return None invalidating the log file.
- Make subtasks collapsible in Inspect View.
- Improved error reporting for missing `web_search()` provider environment variables.

0.3.31

- Deprecated `Plan` in favor of `Solver` (with `chain()` function to compose multiple solvers).
- Added `max_tool_output` generation option (defaults to 16KB).
- Improve performance of `header_only` log reading (switch from json-stream to ijson).
- Add support for 0 retries to `eval-set` (run a single `eval` then stop).
- Tool calling fixes for update to Mistral v1.1. client.
- Always show `epochs` in task status (formerly wasn't included for multiple task display)
- Render transcript `info()` strings as markdown
- Eliminate log spam from spurious grpc fork message.
- Fix issue with hf_dataset shuffle=True not actually shuffling.
- Improved error handling when loading invalid setuptools entrypoints.
- Don't catch TypeError when calling tools (we now handle this in other ways)

0.3.30

- Added [fork()](https://inspect.aisi.org.uk/agents-api.html#sec-forking) function to fork a `TaskState` and evaluate it against multiple solvers in parallel.
- Ensure that Scores produced after being reduced still retain `answer`, `explanation`, and `metadata`.
- Fix error when running `inspect info log-types`
- Improve scorer names imported from modules by not including the the module names.
- Don't mark messages read from cache with source="cache" (as this breaks the cache key)
- Add `cache` argument to `basic_agent()` for specifying cache policy for the agent.
- Add `cache` field to `ModelEvent` to track cache reads and writes.
- Compatibility with Mistral v1.1 client (now required for Mistral).
- Catch and propagate Anthropic content filter exceptions as normal "content_filter" responses.
- Fix issue with failure to report metrics if all samples had a score value of 0.
- Improve concurrency of Bedrock models by using aioboto3.
- Added [SWE Bench](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/swe_bench), [GAIA](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gaia), and [GDM CTF](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/in_house_ctf) evals.

0.3.29

- Added `--plan` and `-P` arguments to `eval` and `eval-set` commands for replacing the task default plan with another one.
- Improved support for eval retries when calling `eval()` or `eval_set()` with a `plan` argument.
- Don't log base64 images by default (re-enable logging with `--log-images`).
- Provide unique tool id when parsing tool calls for models that don't support native tool usage.
- Fix bug that prevented `epoch_reducer` from being used in eval-retry.
- Fix bug that prevented eval() level `epoch` from overriding task level `epoch`.

Page 9 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.