Inspect-ai

Latest version: v0.3.82

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 14

0.3.58

- Support for [audio and video](https://inspect.aisi.org.uk/multimodal.html) inputs for Open AI and Google Gemini models.
- Task display: Added Timeout Tool button for manually timing out a tool call.
- Task display: Automatically switch to "plain" mode when running in a background thread
- Sandboxes: Setup and initialisation errors are now handled at the sample level.
- Sandboxes: Increase setup script timeout to 5 minutes (from 30 seconds) and do not retry setup scripts (in case they aren't idempotent).
- Sandboxes: Add `timeout_retry` option (defaulting to `True`) to `exec()` function.
- Sandboxes: Add `type` and optional `container` properties to `SandboxConnection`.
- Docker: Services which exit with status 0 during setup no longer cause an error.
- `task_with()` function for creating task variants.
- Added `--filter` argument to trace CLI commands for filtering on trace log message content.
- Print model conversations to terminal with `--display=conversation` (was formerly `--trace`, which is now deprecated).
- HuggingFace: Support models that don't provide a chat template (e.g. gpt2)
- Eval Set: Ensure that logs with status 'started' are retried.
- Rename the built in `bootstrap_std` metric to `bootstrap_stderr` (deprecate `bootstrap_std`)
- Bugfix: Fix duplication of summaries when eval log file is rewritten.

0.3.57

- [Tracing API](https://inspect.aisi.org.uk/tracing.html#tracing-api) for custom trace logging.
- Inspect View: never truncate tool result images and display at default width of 800px.
- Inspect View: display tool error messages in transcript when tool errors occur.
- Inspect View: display any completed samples even if the task fails because of an error
- Inspect View: don't display the 'input' column heading if there isn't an input
- Open AI: Handle additional bad request status codes (mapping them to appropriate `StopReason`)
- Open AI: Use new `max_completion_tokens` option for o1 full.
- Web Browser: raise error when both `error` and `web_at` fields are present in response.
- Sandboxes: Apply dataset filters (limit and sample id) prior to sandbox initialisation.
- Docker: Prevent issue with container/project names that have a trailing underscore.
- Store: initialise `Store` from existing dictionary.
- Log: provide `metadata_as` and `store_as` typed accessors for sample metadata and store.
- Tool parameters with a default of `None` are now supported.
- More fine graned HTML escaping for sample transcripts displalyed in terminal.
- Bugfix: prevent errors when a state or storage value uses a tilda or slash in the key name.
- Bugfix: Include input in sample summary when the sample input contains a simple string.

0.3.56

- [Human Agent](https://inspect.aisi.org.uk/human-agent.html) solver for human baselining of computing tasks.
- [Typed interfaces](https://inspect.aisi.org.uk/typing.html) to `Sample` store and metadata using Pydantic models.
- [Approval policies](https://inspect.aisi.org.uk/approval.html#task-approvers) can now be defined at the `Task` level (`eval` level approval policies take precedence).
- Tools can now return `ContentText` and `ContentImage`.
- Move tool result images into subsequent user messages for models that don't support tools returning images.
- `SandboxConnection` that contains login information from sandboxes.
- `display_type()` function for detecting the current display type (e.g. "full", "rich", etc.)
- Trace: improved handling of `eval()` running in multiple processes at once (trace file per-process)
- Docker: don't apply timeouts to `docker build` and `docker pull` commands.
- Bugfix: fix issue w/ `store.get()` not auto-inserting `default` value.

0.3.55

- Bedrock: redact authentication model args from eval logs.
- OpenAI: warn when `temperature` is used with o1 models (as it is not supported).
- Bugfix: spread args for cache trace logging.

0.3.54

- [Tracing](https://inspect.aisi.org.uk/tracing.html) for diagnosing runs with unterminated action (e.g. model calls, docker commands, etc.).
- Provide default timeout/retry for docker compose commands to mitigate unreliability in some configurations.
- Switch to sync S3 writes to overcome unreliability observed when using async interface.
- Task display: Added `--no-score-display` option to disable realtime scoring metrics.
- Bugfix: Fix failure to fully clone samples that have message lists as input.
- llama-cpp-python: Support for `logprobs`.

0.3.53

- OpenAI: Support for o1 including native tool calling and `reasoning_effort` generation option.
- Task API: Introduce `setup` step that always runs even if `solver` is replaced.
- Bedrock: Support for tool calling on Nova models.
- Bedrock: Support for custom `model_args` passed through to `session.Client`.
- Bedrock: Support for `jpeg` images.
- Bedrock: Correct max_tokens for llama3-8b, llama3-70b models on Bedrock.
- Inspect View: Various improvements to appearance of tool calls in transcript.
- Task display: Ensure that widths of progress elements are kept consistent across tasks.
- Sandboxes: New `max_sandboxes` option for (per-provider) maximum number of running sandboxes.
- Sandboxes: Remove use of aiofiles to mitigate potential for threading deadlocks.
- Concurrency: Do not use `max_tasks` as a lower bound for `max_samples`.
- Log recorder: Always re-open log buffer for `eval` format logs.
- Bugfix: Proper handling of text find for eval raw JSON display
- Bugfix: Correct handling for `--sample-id` integer comparisons.
- Bugfix: Proper removal of model_args with falsey values (explicit check for `None`)
- Bugfix: Properly handle custom metrics that return dictionaries or lists
- Bugfix: Proper sample count display when retrying an evaluation
- Bugfix: Fix inability to define and run tasks in a notebook.

Page 5 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.