Inspect-ai

Latest version: v0.3.82

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 14

0.3.76

- [bash_session()](https://inspect.ai-safety-institute.org.uk/tools-standard.html#sec-bash-session) tool for creating a stateful bash shell that retains its state across calls from the model.
- [text_editor()](https://inspect.ai-safety-institute.org.uk/tools-standard.html#sec-text-editor) tool which enables viewing, creating and editing text files.
- Structured Output: Properly handle Pydantic BaseModel that contains other BaseModel definitions in its schema.
- OpenAI: Support for .wav files in audio inputs for gpt-4o-audio-preview.
- OpenAI: Strip 'azure' prefix from model_name so that model type checks all work correctly.
- OpenAI: Don't send `reasoning_effort` parameter to o1-preview (as it is not supported).
- Inspect View: Fix error sorting numeric or categorical score results.
- Inspect View: Properly wrap model API call text in the transcript.
- Bugfix: Only initialise display in eval_set if it wasn't initialised from the CLI
- Bugfix: Set the global log level based on the specified Inspect log level.
- Bugfix: Resolve issue when deserialising a SubtaskEvent from a log file which does not have a completed time.
- Bugfix: Fix unnecessary warnings about task arguments.
- Bugfix: When a task does not take a kwargs argument, only warn if the provided argument is not valid.

0.3.75

- Model API: Specifying a default model (e.g. `--model`) is no longer required (as some evals have no model or use `get_model()` for model access).
- Tasks can now directly specify a `model`, and model is no longer a required axis for parallel tasks.
- Eval Set: Improved parallelisation in scheduler (all pending tasks are now run together rather than in model groups).
- Don't generate `id` for `ChatMessage` when deserialising (`id` is now `str | None` and is only populated when messages are directly created).
- Log: Support for zip64 extensions required to read some log files that are larger than 4GB.
- Anthropic: Provide `reasoning_tokens` for standard thinking blocks (redacted thinking not counted).
- Google: Improve checking of `APIError` status codes for retry.
- CLI: Added `--env` option for defining environment variables for the duration of the `inspect` process.
- Inspect View: Fix issue generating diffs for nested arrays.
- Inspect View: Fix layout issue with sample error display in sample detail summary.
- Inspect View: Better support large eval files (in excess of 4GB).
- Inspect View: Correctly display 'None' when passed in tool calls.
- Inspect View: Fix 'Access Denied' error when using `inspect view` and viewing the log in a browser.
- Bugfix: Properly handle nested Pydantic models when reading typed store (`store_as()`) from log.
- Bugfix: Enable passing `solver` list to `eval()` (decorate `chain` function with `solver`).
- Bugfix: Support deserializing custom sandbox configuration objects when said sandbox plugin is not installed.
- Bugfix: Fix error in sample filtering autocomplete (could cause autocomplete to fail and show an error in js console).

0.3.74

- Bugfix: Exclude chat message `id` from cache key (fixes regression in model output caching).

0.3.73

- Constrain model output to a particular JSON schema using [Structured Output](https://inspect.aisi.org.uk/structured.html) (supported for OpenAI, Google, and Mistral).
- New "HTTP Retries" display (replacing the "HTTP Rate Limits" display) which counts all retries and does so much more consistently and accurately across providers.
- The `ModelAPI` class now has a `should_retry()` method that replaces the deprecated `is_rate_limit()` method.
- The "Generate..." progress message in the Running Samples view now shows the number of retries for the active call to `generate()`.
- New `inspect trace http` command which will show all HTTP requests for a run.
- More consistent use of `max_retries` and `timeout` configuration options. These options now exclusively control Inspect's outer retry handler; model providers use their default behaviour for the inner request, which is typically 2-4 retries and a service-appropriate timeout.
- Improved async implementation using AnyIO (can now optionally run Trio rather than asyncio as the [async backend](https://inspect.aisi.org.uk/parallelism.html#async-backends)).
- Agent Bridge: Correct handling for `tool_choice` option.
- Model API: `ChatMessage` now includes an `id` field (defaults to auto-generated uuid).
- OpenAI: More flexible parsing of content parts (some providers omit the "type" field); support for "reasoning" content parts.
- Anthropic: Retry api connection errors and remote protocol errors that occur during streaming.
- Mistral: Update to new Mistral API (v1.5.1 of `mistralai` is now required).
- Logging: Inspect no longer sets the global log level nor does it allow its own messages to propagate to the global handler (eliminating the possiblity of duplicate display). This should improve compatibility with applications that have their own custom logging configured.
- Tasks: For filesystem based tasks, no longer switch to the task file's directory during execution (directory switching still occurs during task loading). Specify `task(chdir=True)` to preserve the previous behavior.
- Bugfix: Fix issue with deserializing custom sandbox configuration objects.
- Bugfix: Handle `parallel_tool_calls` correctly for OpenAI models served through Azure.

0.3.72

- Computer: Updated tool definition to match improvements in Claude Sonnet 3.7.

0.3.71

- Anthropic: Support for [extended thinking](https://inspect.aisi.org.uk/reasoning.html#claude-3.7-sonnet) features of Claude Sonnet 3.7 (minimum version of `anthropic` package bumped to 0.47.1).
- Reasoning: `ContentReasoning` type for representing model reasoning blocks.
- Reasoning: `reasoning_tokens` for setting maximum reasoning tokens (currently only supported by Claude Sonnet 3.7)
- Reasoning: `reasoning_history` can now be specified as "none", "all", "last", or "auto" (which yields a provider specific recommended default).
- Web Browser: [Various improvements](https://github.com/UKGovernmentBEIS/inspect_ai/pull/1314) to performance and robustness along with several bug fixes.
- OpenAI: Provide long connection (reasoning friendly) socket defaults in http client
- OpenAI: Capture `reasoning_tokens` when reported.
- OpenAI: Retry on rate limit requests with "Request too large".
- OpenAI: Tolerate `None` for assistant content (can happen when there is a refusal).
- Google: Retry requests on more HTTP status codes (selected 400 errors and all 500 errors).
- Event Log: Add `working_start` attribute to events and `completed` and `working_time` to model, tool, and subtask events.
- Human Agent: Add `task quit` command for giving up on tasks.
- Human Agent: Don't emit sandbox events for human agent
- Inspect View: Improve rendering of JSON within logging events.
- Inspect View: Improve virtualized rendering of Sample List, Sample Transcript, and Sample Messages.
- Task Display: Let plugins display counters ('rich' and 'full' display modes only).
- Inspect View: Fix layout issues with human agent terminal session playback.
- Inspect View: Improve tool input / output appearance when rendered in VSCode.
- Inspect View: Display reasoning tokens in model usage for the samples and for the complete eval.
- Inspect View: Improve model api request / response output when rendere in VSCode.
- Inspect View: Improve rendering of some tool calls in the transcript.
- Bugfix: Fix audio and video inputs for new Google GenAI client.
- Bugfix: Ensure that token limits are not enforced during model graded scoring.
- Bugfix: Catch standard `TimeoutError` for running shell commands in the computer tool container.
- Bugfix: Correct combination of consecutive string based user messages for Anthropic provider.

Page 2 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.