Automox-mcp

Latest version: v2.2.2

Safety actively analyzes 945810 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 9

2.2.2

Added

- **`discover_capabilities` is now runtime-aware and self-checking** ([217](https://github.com/AutomoxCommunity/automox-mcp/issues/217)). Discovery previously answered from a static catalog that could diverge from the callable registry three ways: env-gated tools were listed while unregistered, read-only mode hid all 48 write tools but discovery still listed them, and `AUTOMOX_MCP_MODULES`-filtered domains stayed listed. Each tool entry now carries `available` (introspected from the live registry at call time, tool-prefix aware); the four opt-in gated tools always name their `gated_by` env var so clients can filter to actually-callable tools or surface how to enable a capability. Cross-listed entries are flagged `cross_listed` and `compound` is marked `alias_domain` (per-domain `tool_count` values were never summable — the naive sum double-counts four tools). The no-arg call now returns self-check totals (`unique_tool_count`, `registered_tool_count`, per-domain counts, any `unavailable_tools`) plus a note documenting that the directory excludes `discover_capabilities` itself; `list_all_tools=true` returns a flat deduplicated tool list. Field-motivated: a deliberate full enumeration from a client silently missed a domain because nothing in the response made the omission detectable. Guarded by new tests asserting discovery output matches the registered set under full / gated-off / read-only / module-filtered / prefixed configurations, and the smoke suite now reconciles discovery totals against the live session's advertised tool surface.

Fixed

- **`docs/tool-reference.md` — stale counts corrected in un-CI-guarded sections, plus a new "At a Glance" capability summary.** The Tool Safety Annotations table still carried the pre-2.0.0 split ("58 read / 22 write" → **85 / 48**), the Special Parameters and Idempotency Keys sections claimed `request_id` on "all 22" / "21 idempotent" write tools (→ all **48** write tools, verified by registered-server introspection), and the markdown `output_format` parameter was documented on "13 list tools" (→ **64** read tools). Added the missing Tool Safety Annotations / Pagination entries to the table of contents and an At-a-Glance table up top (tools, API coverage, Apps, resources, prompts, safety, enterprise features). The headline counts these sections drifted from were already CI-guarded; these prose spots were not.

2.2.1

Added

- **Create "Patch by Severity" policies via `apply_policy_changes`.** A patch policy with `configuration.patch_rule='filter'` + `filter_type='severity'` + a `severity_filter` list (e.g. `['critical']`) is now constructed correctly — previously the builder required a name-pattern `filters` and rejected any severity-only policy. `severity_filter` values are normalized (lowercased, de-duplicated) and validated against the API enum (`no_known_cves`, `none`, `unknown`, `low`, `medium`, `high`, `critical`); an unknown value is rejected with the allowed set rather than an opaque API error. Verified end-to-end against a live tenant (create→read-back→delete; the API persisted all seven severity values). Replaces the prior limitation that pointed severity policies to `clone_policy`/the console.
- **`docs/release-notes.md` — customer-facing feature highlights.** A curated list of notable features and capabilities, primarily to inform customers and secondarily as source material for go-to-market content. It is intentionally not comprehensive — `CHANGELOG.md` remains the authoritative, complete record of every change — and links here for full detail. Authoring scope and conventions are documented in `CLAUDE.md`.

Fixed

- **Worklet/policy code fields are no longer corrupted on read-back.** The response sanitizer rewrote markdown-link syntax `[text](url)` → `text`, stripped `<…>` as HTML, and removed fenced code blocks — mangling legitimate PowerShell/shell syntax in `evaluation_code`/`remediation_code`/`installation_code` (e.g. `[bool](Get-…)` and other `[type](…)` casts shown as the bare word, `[xml]`/here-string/heredoc content containing `<`, and embedded code fences in notes). Code-bearing fields are now exempt from the structural mutators and instruction-prefix stripping, retaining only the syntax-neutral defenses (Unicode NFKC normalization + invisible/zero-width-character stripping). This is a display-only fix — outbound request bodies were never sanitized, so the code stored in Automox was always intact. The canonical "fields that carry code" set is now defined once (`utils.sanitize.CODE_BEARING_FIELDS`) and shared with the device-payload trimmer so the two definitions can't drift.
- **"Patch all" (and manual/advanced) patch policies can now be created via `apply_policy_changes`.** Normalization stripped `configuration.filter_type` and the payload builder only re-added it for `patch_rule='filter'`, so a non-filter patch policy was sent without `filter_type` and the API rejected it (`400 — "filter_type field is required"`). `filter_type` is now preserved through normalization and defaults to `'all'` for non-filter rules (`all`/`manual`/`advanced`), honoring a caller-supplied value when present — matching the live API contract.
- **`apply_policy_changes` now returns the new policy's id after a create.** `POST /policies` responds `201` with an empty body (no id), so the wrapper previously returned `policy_id: null` and could not echo back the created policy. It now resolves the id by listing the org's policies and matching the new policy's name (newest match), then fetches and returns the stored policy. (Surfaced by the new `206` live create→delete smoke round-trip, which depends on this to self-clean.)
- **The `schedule` helper block is no longer dropped by `apply_policy_changes`.** The tool's request model declared only the expanded `schedule_days`/`schedule_time` bitmask fields and silently discarded the documented `schedule: {days, time}` helper (it accepted `extra="ignore"`), so a create using the helper failed local validation with "missing required fields: schedule_days, schedule_time." The model now accepts `schedule` and expands it server-side.
- **Corrected the patch-policy resource documentation.** The `patch_rule` enum advertised values that do not exist in the API (`severity`, `custom`) and omitted the real ones (`manual`, `advanced`). "Patch by Severity" is `patch_rule='filter'` + `filter_type='severity'` + a `severity_filter` list — not a `patch_rule='severity'`. The build templates and configuration reference now match the spec; `filter_type` is documented as required (auto-set to `'include'` for `filter`, `'all'` for non-filter rules). (Constructing severity-filtered policies through `apply_policy_changes` is now implemented — see the Added entry above.)
- **Corrected a stale tool count in the 2.1.0 entry** (130 → 133). 2.1.0 changed no tools, so its count matches 2.0.0's **133** (85 read / 48 write); 130 was the pre-2.0.0 base, before `delete_device`, `upload_policy_file`, and `list_webhook_deliveries` landed in 2.0.0. Consistent with the 2.0.0 and 2.2.0 entries and the CI-guarded registered count (`test_doc_tool_counts.py`).

2.2.0

**The MCP Apps release.** Ships five interactive `ui://` **MCP App** surfaces (the `io.modelcontextprotocol/ui` extension) on a new structured-output (`outputSchema`) foundation: a read-only compliance-triage pilot, plus patch-approval, policy blast-radius, remediation-apply, and RBAC access-certification review flows. Write-flow Apps drive the **existing gated tools** through the host's `CallTool` confirmation — no new tools and no new gates — and degrade gracefully to structured output on non-Apps hosts; every App UI is self-contained (inline JS/CSS, no external/CDN loads) under the host's deny-all CSP. The model-facing tool set is unchanged (133 tools / 85 read / 48 write); MCP resources grow 9 → 14. Also in this release: `outputSchema` on the compound and report tools (FastMCP validates returns at runtime, so the models are deliberately permissive), a demand-driven schema policy (`CLAUDE.md`), `maybe_format_markdown` now returns a `ToolResult` so markdown output and an object schema can coexist, and a `list_zone_users` secret-redaction hardening (V-182).

Added

- **Read-first MCP App: access certification (RBAC) review.** `list_users` now attaches an `AppConfig` → new `ui://automox/access-certification.html` resource and advertises an `outputSchema`. Apps-capable hosts render an inline review of account users with their RBAC roles and 2FA status; the operator can **certify** or **flag** each user in-session (a review acknowledgment — it writes nothing). The App is intentionally read-first by **deliberate scope choice** — not because the API can't write. Acting on a finding splits three ways: **API-key revocation** is fully wireable today (numeric-keyed `list_user_api_keys` → `update_user_api_key`/`delete_user_api_key`) and is the fast-follow ([192](https://github.com/AutomoxCommunity/automox-mcp/issues/192)); **role change** has no API tool (`update_user` is profile-only; role is set only at invite); **membership revoke** exists but is blocked on the account-user UUID gap ([#193](https://github.com/AutomoxCommunity/automox-mcp/issues/193)). MCP resource count 13 → 14.

- **Write-flow MCP App: remediation-apply review** (the gated operation — most design care). `get_action_set_solutions` now attaches an `AppConfig` → new `ui://automox/remediation-apply.html` resource and advertises an `outputSchema`. Apps-capable hosts render an inline review of each remediation solution — its vulnerabilities and the devices it targets — expandable to the per-device names, OS, and status ("what will be applied, where") — with a per-solution **patch-now** apply that drives the existing **Tier-2 env-gated** `apply_remediation_actions` through the host `CallTool` bridge. The review UI *is* the gate's required mitigation. Patch-now is offered **only for solutions whose remediation is a direct patch**; worklet-based (and other non-patch) solutions are shown for review with a note but no apply button — `patch-with-worklet` (arbitrary model-authored code — the trigger the env gate exists for) is deliberately **not** offered in the UI. The apply is inert until `AUTOMOX_MCP_ALLOW_APPLY_REMEDIATION_ACTIONS` is set (the tool is otherwise unregistered); the review stays usable. MCP resource count 12 → 13.
- **Write-flow MCP App: policy change + blast-radius review.** `apply_policy_changes` (which has a `preview` flag) now attaches an `AppConfig` → new `ui://automox/policy-blast-radius.html` resource and advertises an `outputSchema`. On a `preview=true` call, Apps-capable hosts render an inline review of each proposed operation and its **affected-device scope** (server groups + device filters), with an optional on-demand resolution of the concrete affected devices (via `preview_policy_device_filters`) shown as an expandable list of device names + OS, and server-group **names** resolved (via `list_server_groups`) rather than bare ids; **Apply** re-invokes `apply_policy_changes` with `preview=false`. Reuses the existing Tier-1 gate (host-confirmed; unregistered in read-only) — no new write tool, no new gate. Extends the shared host bridge with `onInput` (captures the entry tool's arguments to re-issue the write). MCP resource count 11 → 12.
- **Write-flow MCP App: patch-approval review** (flagship interactive flow). `patch_approvals_summary` now attaches an `AppConfig` pointing at a new `ui://automox/patch-approval.html` resource and advertises an `outputSchema`; Apps-capable hosts render an inline review surface where the operator approves/rejects each pending patch approval. Each decision drives the **existing** Tier-1 `decide_patch_approval` write tool through the host `CallTool` bridge — no new write tool and no new gate: the host's confirmation dialog remains the gate, and in read-only mode the write tool is unregistered so the App degrades to view-only. Introduces a shared, self-contained host bridge (`resources/_app_bridge.py`, `window.AutomoxApp`) reused by future write-flow Apps. MCP resource count 10 → 11.
- **Structured `outputSchema` on the report tools** (`prepatch_report`, `noncompliant_report`). Both now advertise a JSON Schema for their `{"data", "metadata"}` envelope, derived from permissive Pydantic models in `schemas.py` (phase 2 of the structured-output work; see the compound tools above). The schema validates in both JSON and markdown modes.
- **Read-only MCP App: compliance triage UI** (the `io.modelcontextprotocol/ui` Apps extension). `get_compliance_snapshot` now attaches an `AppConfig` pointing at a new `ui://automox/triage.html` resource (MIME `text/html;profile=mcp-app`); Apps-capable hosts render an interactive non-compliant-triage surface (compliance posture, non-compliant devices, stale devices, policy summary) inline, fed by the tool's structured output. The UI is fully self-contained (inline JS/CSS, no CDN imports) and embeds the shared `window.AutomoxApp` host bridge (`resources/_app_bridge.py`) — consistent with the write-flow Apps — degrading gracefully on non-Apps hosts (they ignore the App link and receive the structured snapshot unchanged). The device tables get the same **name-surfacing** treatment as the policy blast-radius App: a non-compliant device's `server_group_id` resolves to a group **name** via the read tool `list_server_groups` (the table re-renders when it returns), and both tables read the **live** snapshot row shape (`server_name`/`display_name` for the device, `platform`/`os_family` for OS, `needs_reboot`/`connected` for state) — fixing four columns (device, OS, group, reboot flag) that previously rendered blank against an invented shape. The only tool this read-only UI drives is that one read; no new model-facing tool. MCP resource count 9 → 10 (resource-count guard added later in this release — see Changed).
- **Structured `outputSchema` on the three compound tools** (`get_compliance_snapshot`, `get_patch_tuesday_readiness`, `get_device_full_profile`). Each now advertises a JSON Schema (derived from a Pydantic model in `schemas.py`) for its `{"data", "metadata"}` envelope, so schema-aware MCP hosts can validate results and render them richly. The runtime return value is unchanged. The output models are deliberately permissive (all-optional fields, `dict[str, Any]` for variable sub-objects, never `extra="forbid"`) because FastMCP validates returned structured content against the schema at runtime — a too-strict schema would reject the legitimately-mutated envelope (token-budget truncation, section summaries, correlation id). Tool count unchanged.

Changed

- **`maybe_format_markdown` now returns a FastMCP `ToolResult` in markdown mode** instead of replacing `data` with a markdown string. The `ToolResult` carries the rendered table as text content (for human-facing hosts) *and* the full, unchanged `{data, metadata}` object as `structuredContent` (for schema-aware hosts and MCP Apps). This unblocks advertising an object `outputSchema` on read tools while still offering markdown output — previously the markdown string could not satisfy an object schema. The return type of the ~66 markdown-capable read tools was widened accordingly (`ToolResult | dict`); their JSON-mode behavior is unchanged.
- **MCP resource counts are now CI-guarded.** `tests/test_doc_tool_counts.py` asserts the **"N MCP resources"** strings in `docs/tool-reference.md` and `mcpb/manifest.json` against the actually-registered resource set (real-FastMCP introspection, mirroring the tool guard) — closing the previously-silent resource-count drift now that `ui://` MCP App resources exist. (The resource-table row and the ` MCP Apps` note remain manual — not derivable from a count.)
- **fastmcp 3.4 `AppConfig` migration.** The `fastmcp` 3.4.2 bump renamed the `AppConfig` constructor argument `resourceUri` → `resource_uri` (snake_case); the five MCP-App entry tools were updated accordingly. The serialized `ui` tool-meta still emits the camelCase `resourceUri` wire alias, so the App wire contract is unchanged (covered by the existing App tests).
- **Maintenance dependency refresh (currency only — no open vuln above the prior lock).** cryptography 46.0.7 → 48.0.0, pydantic 2.12.5 → 2.13.4, starlette 1.1.0 → 1.2.1, requests 2.33.1 → 2.34.2, idna 3.16 → 3.18, authlib 1.7.1 → 1.7.2, uvicorn 0.42.0 → 0.49.0, python-multipart 0.0.27 → 0.0.32, plus sse-starlette and watchfiles. Each was checked against its changelog and the GitHub Advisory DB: every relevant CVE floor was already satisfied by the prior lock, so these carry no security fix above it.
- **Dev tooling.** mypy 1.20.0 → 2.1.0 (whole-repo type-check passes clean under 2.x) and ruff 0.15.8 → 0.15.16 (no format or lint changes). Synced the `.pre-commit-config.yaml` hook pins to the lock — ruff `v0.14.1` → `v0.15.16`, bandit `1.8.3` → `1.9.4` — closing a commit-stage-hook vs. project-venv version skew.

Security

- **`list_zone_users` no longer forwards the raw user DTO** (V-182). It now projects each zone-member user through a secret-stripping allowlist (identity + RBAC roles + membership key, plus a `uuid` if present), so the `intercom_hmac` chat-auth secret the User DTO carries is never surfaced to the model — matching the redaction every other user listing (`list_users`/`get_user`) already applied. The previous raw `_envelope` forward had no projection, and `intercom_hmac` is not covered by the error-payload-only field redaction. (Live exploitability depends on whether the upstream zone-users endpoint populates the field; the fix restores the documented "excluded from every projection" invariant as defense-in-depth regardless.)
- **Dependency security bumps (durably floored).** `mcp` 1.26.0 → **1.27.2** closes two high-severity advisories the prior lock was in-range for: **GHSA-jpw9-pfvf-9f58** (CVSS 7.1 — HTTP transport sessions not bound to the authenticated principal) and **GHSA-hvrp-rf83-w775** (CVSS 7.6 — experimental task handlers reachable across sessions). `fastmcp` 3.2.0 → **3.4.2** picks up first-party OAuth-proxy / response-cache / inbound-header-forwarding hardening landed in 3.2.4 and 3.3.0. Both floors were raised in `pyproject.toml` (`mcp>=1.27.2`, `fastmcp>=3.4.2`) so the fix survives a fresh resolve rather than living only in the lock.

2.1.0

**The projection-audit release.** Every model-facing tool output went through a 12-domain audit (54 adversarially-verified findings), an independent re-audit that live-verified each fix against the production API and caught 23 further defects, and a live verification campaign that upgraded dozens of field legends from spec-assumed to live-verified provenance — 24 PRs in total. The tool set is unchanged (133 tools); the theme is **self-describing output**: raw values are preserved for fidelity, with decoded siblings and `metadata.field_notes` legends stating verified units, vocabularies, and semantics, so a model no longer has to guess what an integer status, a unit-less number, or a bare enum string means. No tool signatures were removed or narrowed; some output shapes changed (phantom keys that never carried data were dropped, misdocumented fields renamed, flat counts regrouped) — each is detailed below. Where the audit found the *upstream* spec or API at fault, the affected legend says so explicitly rather than papering over it.

Added

- **`device_detail` now emits a device-level `compliance` rollup.** Counts per policy state plus the named policies in `needs_remediation`, and a note encoding the rule (verified live across ~20 devices): a device is non-compliant when at least one policy needs remediation — pending policies alone do not count against compliance. Previously the model had to derive all of this from raw per-policy entries.

Fixed

- **Approvals decision-axis and remediation-status legends corrected/upgraded from a device-bearing test-org verification (issue 165; live-verified 2026-06-06).** Legend/description text plus one tiny projection fallback — no tool-count changes. (1) **Approvals decision axis.** On a 12-row device-bearing approvals queue, `status` on a DECIDED row carries the decision OUTCOME, not an execution status: `manual_approval=true` co-occurred 1:1 with `status='approved'` (8/8) and `false` with `status='rejected'` (4/4), and `manual_approval_time` is a `'YYYY-MM-DD HH:MM:SS'` string. The earlier claim that an awaiting row carries `status='active'` is removed from the `get_patch_tuesday_readiness` comment, the `patch_approvals_summary` legend, and the test fixtures — an awaiting row could not be generated in the verification cycle, so status-while-awaiting stays unobserved and is no longer asserted; the pending count remains keyed on `manual_approval is None`, which is now better supported (status is the decision word, so keying pending on status would be wrong regardless). The record has no top-level `severity` (12/12) and the envelope is `{results, size}`. (2) **`patch_approvals_summary` nested-severity fallback.** The live record carries a nested `software.severity` key (null on every observed row, but present in the shape); `summarize_patch_approvals` read only the absent top-level severity, so a populated nested value would be missed. It now falls back to `software.severity` before bucketing to `unspecified`. (3) **Remediation status vocabulary.** A full gated patch-now execution observed `devices[].status` transition `not-started -> in_progress` (per-device and execution-scoped — an untargeted sibling device stayed `not-started`), with `in_progress` emitted within seconds of the 202; the legend now calls out the SEPARATOR INCONSISTENCY (`not-started` hyphenated, `in_progress` underscored), notes the spec's `pending` example was not seen live, and keeps the terminal value open (the device held `in_progress` for a ~24-minute poll without a captured terminal string). `solutions[].status` is confirmed null/absent on automox-patch across the lifecycle (status lives on `devices[]`), and patch-now does NOT create a persistent policy — it dispatches a direct `InstallUpdate` device command (`policy_id=0`).
- **Policy-window and action-set status legend provenance upgraded from live re-verifications (issue 165; live-verified 2026-06-06).** Legend/description text only — no projection logic or tool-count changes. (1) **`use_local_tz` / `dtstart`.** The prior "use_local_tz=true semantics are per spec, unverified live (the probe used false)" framing is replaced with the live facts: the upstream REJECTS `use_local_tz=true` with HTTP 400 (`invalidFields.useLocalTz="use_local_tz cannot be set to true"`, unconditional of recurrence on this tenant — possibly tenant/plan-conditional, not asserted universal), so `false` is the only persistable value here; `dtstart` is echoed verbatim with no normalization; and the entity carries no timezone-resolution field, so the true-case "same wall-clock in each device's local timezone" meaning is spec-only AND device-side — structurally unprovable from the API entity even where the flag is allowed. (2) **`duration_minutes`.** New note: it may be recomputed upstream for `recurrence=once` windows — live, sent 30 but create AND get echoed 389494 (≈ the dtstart→UNTIL span); the wrapper passes the value verbatim, so callers must read back the echoed value rather than trusting their input. (3) **`get_action_set_detail.status`.** A full upload-to-completion poll observed the lifecycle `building -> ready` settling in ~2s; `'building'` is emitted by the API itself in the 201 create response (not a wrapper default), `'ready'` is the confirmed terminal value, and `'active'` was NEVER observed live (spec example only) — replacing the earlier claim that mixed the spec example with live values.
- **`get_data_extract` now accepts an integer `extract_id` (re-audit release blocker; live-verified 2026-06-05).** Live extract ids are integers (`list_data_extracts` returns e.g. `479737`), so a model relaying an id from the list call passed an int and the tool rejected it at the parameter boundary with a pydantic `string_type` error before any request was made. The tool parameter is widened to `int | str` and coerced to a string for the URL path; `GetDataExtractParams.extract_id` gains a before-validator that coerces an int to str (the path-safety regex still applies; a bool is not treated as an int). Verified end-to-end through the FastMCP tool boundary: an int id now returns the correct record. The `list_data_extracts` smoke check was upgraded from `resp is not None` to an envelope-unwrap reconciliation — `total_extracts` must be the integer envelope `size`, every returned row must be an unwrapped extract dict carrying an `id` (never the leaked `{results,size}` envelope), and `total_extracts >= page_row_count` (size is the cross-page grand total) — so a regressed unwrap that left the whole envelope as one bogus row would now fail instead of passing on a 200.
- **Device-surface compliance and severity legends reconciled with the authoritative signals (re-audit cluster device-surfaces; live-verified 2026-06-05).** (N2) The legacy device-level policy-status string (e.g. `"non-compliant"`) can contradict the authoritative `compliant` boolean: live, a device with `compliant=true` and 13 pending policies still reports the string `"non-compliant"`, and fleet-wide the legacy axis read 175 non-compliant while the authoritative axis read 129. The raw strings are kept (no rename/drop), but each contradicting surface now carries a reconciling `metadata.field_notes` entry pointing to the authoritative rollup/boolean: `device_detail.core.status` → `compliance.device_compliant`; `list_devices.devices[].policy_status` → `device_detail`/`device_health_metrics`; `device_health_metrics.policy_execution_breakdown` → `compliance_breakdown`. (N10) `device_detail`'s two per-policy breakdowns — `compliance.policy_status_counts` (from `policy_status[]`) and `policy_assignments.status_breakdown` (from `server_policies[]`) — are computed from different upstream arrays and can differ by a policy or two (live: 31 vs 30); the `status_breakdown` legend now explains the two source arrays so the model knows which to cite. (N4) `search_devices`'s description and `docs/tool-reference.md` dropped the false guarantee that "every returned device has at least one missing patch at that severity" (false on ~41% of live `critical` results) while keeping the correct `pending_patches`-scoping disclaimer. (N5) ADS/saved-search device results (`advanced_device_search`, `get_saved_search_results`, `get_cached_search_results`, `run_saved_search`) now carry a `metadata.field_notes.outstanding_patch_severity` legend (and `advanced_device_search`'s description states it) distinguishing the string `'none'` (assessed, no outstanding patches — clean) from JSON null/absent (device not yet assessed — unknown, NOT clean); live distribution confirmed both states present. All legend-only — raw values pass through unchanged; no tool counts change.
- **Policy-domain projection re-audit fixes (audit; live-verified or spec-attributed 2026-06-05).** Seven items from the re-audit final list. (1) **banner_stats unit legend** — `policy_history_detail` and `policy_runs_for_policy` now attach a `metadata.field_notes` legend stating that `banner_stats.policy_success_rate` is a percentage in the 0–100 range, NOT a 0–1 fraction (live-verified; observed values `0.0`/`60.0`/`100.0`), and that `total_policies_applied`/`total_successful_devices` are plain counts — so the model stops reading the rate as a fraction or a count (baseline finding 22). (2) **`get_patch_tuesday_readiness` pending-approval count** — the count keyed on `status in ('pending','Pending')`, but the decision axis is `manual_approval` (per 154: True=approved, False=rejected, null=awaiting decision); the count is re-keyed on `manual_approval is None`, and the compound test fixtures that encoded the wrong `status:'pending'` assumption (the 132 trap) were corrected (baseline finding 51). The exact `status` value an awaiting row carries remains unobserved live — the later issue-165 verification (see above) found `status` holds the decision outcome `approved`/`rejected` on decided rows and withdrew any specific awaiting-value claim. (3) **`policy_catalog` phantom `next_run`** — the projection read `next_run`, a key the `/policies` list endpoint never returns (confirmed absent live), emitting a confident `next_run:null` that read as "nothing scheduled"; it is removed and replaced by the spec-defined `next_remediation`, surfaced only when present, with a legend warning that absence does not mean unscheduled (per spec, not observed live on this tenant) (finding N8). (4) **`policy_runs_v2.result_status` semantics** — the description now states the filter is any-device-with-this-outcome (live-verified: `result_status='failed'` returns runs with 1 failed device alongside 200+ not-failed, and a run can match multiple statuses) (finding N7). (5) **`search_policy_windows(recurrences=)` case** — the spec declares an UPPERCASE enum; tokens are now coerced to uppercase before sending (case-insensitive accept) so a lowercase token can no longer silently match zero windows (spec-attributed, not live-verified — tenant had 0 windows) (finding N11). (6) **`create_policy_window` dtstart description** — the `use_local_tz=true` case is now qualified as per spec/unverified-live (the controlled-object probe used `use_local_tz=false`), matching the workflow legend's provenance discipline. (7) **policy-history test fixture** — the `_POLICY_RUNS` fixture gained the `device_count` field the projection surfaces and the legend depends on (captured shape: per-outcome device counts sum to `device_count`, live-verified across 8 runs), with an assertion exercising the sum.
- **Re-audit legend/projection corrections across packages, worklets, audit-v2, vuln-sync, and the patch-readiness compound (re-audit 2026-06-05; all read-side items live-re-verified).** (1) **Package `severity` legend.** `low` was reclassified from `spec_only_unverified` to `observed_live` in the shared `_SEVERITY_FIELD_NOTE` and both package tool descriptions — a fresh org-package probe returned `low` in the live severity distribution (spec-only set is now `none`/`unknown`). (2) **`search_org_packages` page-scoped count.** The endpoint returns a bare list with no total, so the prior `total_packages` (which fell back to the page length, e.g. 500, while >1000 packages exist) was relabeled `returned_package_count` and accompanied by `metadata.pagination` (`page`/`page_size`/`upstream_total`/`has_more`) and a field-note — a full page now signals more pages instead of masquerading as a fleet-wide total. (3) **Worklet detail safety flag.** `user_interaction_required` (a live-present unattended-execution safety flag, observed on 10/60 worklets) was added to the `get_worklet_detail` projection allowlist; it had been silently dropped. (4) **`audit_events_ocsf` event identifier.** The projection read a non-existent top-level `uid` (always null); it now surfaces the real `_id` plus `metadata.uid`. (5) **`investigate_noncompliant_device` prompt.** Rewrote the step that told the model to filter packages on the phantom `patch_status` field (removed by the 159 projection fix) to use the real `installed` boolean and `severity`. (6) **`get_patch_tuesday_readiness` severity legend.** Aligned the compound description to the full value set the prepatch projection emits (`critical/high/medium/low/none/no_known_cves/unknown`) and corrected the false "null when unrated" claim — the projection emits the string `unknown`, never JSON null. (7) **`get_action_set_detail` status provenance.** The legend had claimed `active`/`ready`/`building` were all "observed live"; only `ready` was observed on this tenant (`active` is the spec example, `building` is the wrapper's upload default) — provenance is now stated honestly per value. (8) **`user_access` audit-category provenance.** Corrected the comment/legend that lumped `account_change` and `user_access` together as "spec-example derived": `account_change` has a spec example, `user_access` does not (it is an inference) — labeled accordingly. Unit fixtures corrected to captured live shapes (worklet `device_type` is a list `['SERVER','WORKSTATION']`, not the scalar `"endpoint"`; OCSF event identifier is `_id`+`metadata.uid`, not top-level `uid`); smoke upgraded to assert the page-scoped count and truncation signal on `search_org_packages`.
- **`device_detail` and `get_device_full_profile` stopped surfacing phantom keys; `devices_needing_attention` and `search_devices` got real triage signal (audit; live-verified 2026-06-05).** (1) `software_preview` dropped an always-null `status` field (no such key exists on `/servers/{id}/packages` items) and now projects the real per-package signals — `installed`, `ignored`, `severity`, `agent_severity`, `cve_score`, `cves`, `requires_reboot`, `deferred_until`. Observed severity vocab: `critical`/`high`/`no_known_cves`/null; `agent_severity` is surfaced raw (per spec it may be a text severity or a numeric CVSS score). (2) `pending_commands` was reading three keys that don't exist on the queue item (`command`/`scheduled_time`/`status` — every queued command came back null); it now maps the live Command fields (`command_type` from `command_type_name`, `scheduled_time` from `exec_time`, plus `policy_id`/`args`/`response`). The command vocabulary is non-exhaustive (kept raw). (3) `policy_assignments.status_breakdown` was emitting raw integer codes (`{"1": 17, "2": 13}`) with no legend; a per-policy crosstab confirmed `server_policies[].status` is the same `0 needs_remediation` / `1 up_to_date` / `2 pending` enum as `policy_status[].status` (all three values observed live), so it now decodes to labels. Server groups were a list of integer IDs that the projection tried to read as `.name` objects (always empty); they now surface as `server_group_ids`. (4) `devices_needing_attention` now carries the per-policy `severity`, failure `reason`, and `policy_create_time` the report DTO exposes (present on every live entry; observed severities unknown/critical/high) so triage no longer loses the priority signal. (5) `search_devices` documents that per-device `pending_patches` is the all-severity outstanding total, not the severity-filtered subset. The compound `get_device_full_profile` forwards the corrected `pending_commands`/`policy_assignments` verbatim, so it is fixed by the same projection change. Unit fixtures rewritten from the live (sanitized) shapes (integer policy status, integer group IDs, no package `status` key, `command_type_name`/`exec_time` queue items).
- **Reports tools distinguish patch severity states and explain compliance (audit cluster reports).** Live-verified 2026-06-05. (1) `prepatch_report` no longer collapses `no_known_cves` (patches carry no associated CVE — benign) into `unknown` (severity undetermined): the per-device `highest_severity` keeps them as separate values, and the wrapper's recomputed `summary` now has a distinct `no_known_cves` bucket (the raw upstream summary already carried separate `no_known_cves`/`unknown` counters and is still passed through verbatim under `api_summary` — only the per-device projection and the recomputed summary had been folding them). A `metadata.field_notes` legend plus the tool description state the vocabulary. (2) `prepatch_report` documents that its `compliant` boolean follows the platform rule (a device is non-compliant only when a policy needs remediation; pending work alone does not count against compliance, consistent with 149/155) — the boolean is the upstream device value, passed through raw, so the model no longer treats a `compliant:true` device with pending patches as a contradiction. (3) `noncompliant_report` now surfaces each failing policy's `reason_for_fail` (upstream failure text, truncated past 2000 chars), `severity`, `type`, and a `package_count` instead of only id+name, so the model can state why a device is non-compliant and prioritize (live-verified these fields are populated upstream). (4) Clarified that `prepatch_report.total_pending_patches` is an upstream-reported relabel whose unit the spec does not state and whose per-severity buckets are not guaranteed to sum to it.
- **Policy-execution timeline and per-device run results are now self-describing (audit clusters; live-verified 2026-06-05).** (1) `policy_execution_timeline` previously labeled any run with no successes and no failures as status `unknown`/null and dropped the `remediation_not_applicable` (and `blocked`) device counts entirely — so a run that completed as an all-not-applicable no-op gave the model zero signal and read as ambiguous or failed. Each execution now groups its `pending`/`success`/`failed`/`not_included`/`remediation_not_applicable`/`blocked` values under `device_outcomes` (device counts per outcome, not run statuses — they sum to `device_count`), and a `metadata.field_notes` legend plus the tool description state that a run with no successes or failures but nonzero pending/not-included/not-applicable is a benign no-op or still in progress, not an error. (2) `policy_health_overview`'s run-level `status_breakdown` catch-all bucket is renamed from the opaque `unknown` to `no_success_or_failure` and legended for the same reason. (3) `policy_run_results` now carries the same `metadata.field_notes` exit-code legend already shipped for `policy_run_detail_v2`: `exit_code` is the raw process exit code (0 = success; negative values on Windows are NTSTATUS codes as signed 32-bit ints, e.g. `-1073741502` = 0xC0000142 STATUS_DLL_INIT_FAILED; when null it falls back to the Automox internal `error_code`, a different namespace), and `result_status` is a lowercase per-device outcome string. Raw values are preserved for fidelity; no tool counts change.
- **Package tools no longer advertise or emit phantom fields, and `severity` now carries a vocabulary legend (audit cluster: packages).** Live re-probe 2026-06-05 confirmed, per endpoint, that the `/servers/{id}/packages` and `/orgs/{id}/packages` responses each lack `status`/`patch_status` and `awaiting` (checked across 800+ org packages and the device inventory). `list_device_packages` was forwarding `pkg.get("status") or pkg.get("patch_status")` — a falsy-drop chain over fields that never exist — and its description claimed it "Returns ... patch status"; `search_org_packages` projected a nonexistent `awaiting` output key. Both phantom projections are removed: install state is conveyed by the real `installed` boolean, and `awaiting` remains a correct request-only filter (per spec: `awaiting=1` = available-but-not-installed, `awaiting=0` = installed). The same probe also showed `device_count` is absent from the live org Packages DTO (the org endpoint returns the same per-package shape as the device endpoint, not an aggregated rollup), so the dead `device_count` projection on `search_org_packages` was removed as well. Both tools now attach a `metadata.field_notes.severity` legend and the descriptions state the verified vocabulary: critical/high/medium/no_known_cves and JSON null observed live, with low/none/unknown present in the spec enum but unobserved on the probed tenant — JSON null means no severity assessment was recorded (not a safety claim) and no_known_cves means scanned with no known CVEs, so the model stops guessing the difference. Unit fixtures rewritten from invented shapes to captured (sanitized) live DTOs.
- **Vuln-sync remediation tools now label the coded status/severity strings the model previously received bare (audit findings 35–37).** `get_action_set_solutions` gains a `metadata.field_notes` legend for the three coded fields the API types as bare strings with no enum. Per-vulnerability `severity`: `critical` was observed live, consistent with the lowercase patch-severity scale topping at `critical` (not the capitalized OCSF Low/High/Critical/Fatal scale), but the full value set and ceiling were not confirmed, so the scale stays spec-derived/unverified-live and the model is told to treat severity as the source-reported rating and cross-check CVSS rather than assert a ceiling. Per-device `status`: observed live as `not-started` (the spec's only example, `pending`, was not seen live); the legend reports the observed value and flags the value space as open rather than inventing siblings. Solution-level `status` (defined in the spec for the rapid7/unmatched sub-types) was not present on the live rapid7 solutions, so the legend marks it spec-defined/not-observed-live. `get_action_set_detail` gains a `status` legend stating the lifecycle vocabulary observed live (active/ready/building) while flagging that the spec defines no enum and the terminal value is unconfirmed. Tool descriptions updated to point at the legends; the prior invented `_SOLUTIONS` unit-test fixture is replaced with a sanitized live capture (rapid7-solution shape) plus a spec-derived automox-patch entry for sub-type coverage. Raw values pass through verbatim — legend-only, no projection mutation.
- **Policy-window projections now label verified vocabularies and units instead of forwarding ambiguous raw values (audit clusters 10–11).** `create_policy_window`/`update_policy_window` no longer claim generic "RFC 5545 RRULE" — the descriptions and the `rrule` parameter now state the validator-enforced grammar (recurrence=once requires `FREQ=DAILY;UNTIL=YYYYMMDDTHHMMSSZ`; recurrence=recurring requires `FREQ=YEARLY`+`BYMONTH`+`BYDAY` only; `FREQ=WEEKLY` is rejected with a 400), preventing the model from generating rrules the upstream refuses. `get_policy_window`/`search_policy_windows` now emit a `metadata.field_notes` legend documenting that `status` is lowercase `active`/`inactive` and `recurrence` is UPPERCASE `ONCE`/`RECURRING` on read (while create/update accept lowercase), and that `dtstart`'s trailing `Z` is literal UTC only when `use_local_tz=false` — when `use_local_tz=true` the same wall-clock is applied in each device's local timezone, so the model no longer misreads the `Z` as UTC. The scheduled-windows tools now note that `start`/`end` are derived occurrence times (the window stores `dtstart`+`duration_minutes`+`rrule`, with no stored start/end). `search_policy_windows` pagination metadata now preserves a genuine zero count (`total_elements`/`total_pages`, and `has_more` when `page`/`size` are supplied) instead of dropping it on an empty result set via a falsy-`0` `or`-chain. All vocab/unit claims live-verified 2026-06-05 via an authorized controlled-object oracle; the `use_local_tz=true` timezone semantics are labeled spec-only/unverified.
- **Data-extract tools surfaced fictional or undefined fields and mis-read the API's response shape (audit).** `list_data_extracts` now unwraps the `{"results": […], "size": N}` envelope the API actually returns (live-verified 2026-06-05, same class as the 154 approvals fix) — previously the whole envelope was wrapped as one bogus row, so every list returned a single junk entry — and drops the phantom `name`/`file_size` keys that never existed on the record. `get_data_extract` now reads the real `download_expires_at` key (it was reading a non-existent `expires_at`, so link-expiry never populated) and drops phantom `file_size`/`row_count`. Both now surface the `is_completed` boolean as the reliable readiness oracle and label `status` against the spec enum (`queued|running|complete|failed|canceled|expired`). `create_data_extract` no longer invents the out-of-enum status `"pending"` and handles the array response the spec documents (per spec; the POST is a write so this path is unverified live, with a single-mapping fallback retained). A `metadata.field_notes` legend records that `has_download_url` (not a set `download_expires_at`) is the reliable can-download signal — on the expired records observed live the download link was gone (`download_url` null) regardless of the expiry timestamp. Fixtures replaced with the sanitized live shape (integer ids, the envelope, real keys).
- **Splashtop status tools now document their booleans (audit).** `splashtop_device_status` states that `installation_status` and `registration_status` are independent (a device can be installed but not yet registered) — attributed to the spec `DeviceStatusResponseDto` since installing Splashtop to verify the derivations would be a write, and added as a `metadata.field_notes` legend beside the verbatim payload. `splashtop_session_status` states `can_start_new_session` reflects capacity only (true when `current_sessions < max_sessions`, per the spec examples) and that attended-access consent is a separate precondition — so the field alone does not mean a session can begin.
- **Compliance semantics corrected in two aggregate views (audit; semantics confirmed with maintainer).** (1) `device_health_metrics` counted any device with `pending: true` as non-compliant — contradicting the platform rule verified in the 2.0.x compliance work (a device is non-compliant only when at least one policy needs remediation). It now trusts the upstream `compliant` boolean, exposes the full split as `compliance_breakdown`, and tracks pending work separately as `devices_with_pending_policies`. (2) `policy_compliance_stats` dropped the `pending` count that `/policystats` returns (live-verified 2026-06-05), so a policy with 2 compliant / 0 noncompliant / 166 pending read as "100% compliant over 2 devices". Pending now rides along per policy and overall (`pending_devices`, `pending_rate_percent` over all targeted devices); `compliance_rate_percent` stays computed over evaluated devices but is **null** (not a misleading 0%) when nothing has been evaluated, and `metadata.rate_semantics` states the rule. `policy_type` surfaced per row. Fixture updated to the live key shape (`noncompliant`, `pending`).
- **`patch_approvals_summary` no longer silently reports zero approvals — every approval was being dropped by an envelope mismatch.** `/approvals` returns a `{"size": N, "results": [...]}` wrapper (per `components/schemas/Approvals`; envelope confirmed live 2026-06-05), but the workflow required a bare Sequence, so a conforming response always became `[]` — the tool (and `get_patch_tuesday_readiness`, which consumes it) reported an empty approvals queue regardless of reality. Same class as the 132 envelope bugs, hidden by a fixture invented in the code's own image. Also fixed in the same pass: the per-approval projection read five keys that don't exist on the documented record (`title`/`severity`/`device_count`/`created_at`/`deadline` — every value was null); it now projects the real shape (`title` from `software.display_name`, `software.version`/`os_family`, CVE ids, the policy block, and `manual_approval`: true = approved / false = rejected / null = awaiting decision). The upstream record has no severity field, so `severity_breakdown` buckets those as `unspecified` instead of conflating them with a literal "unknown". Fixtures rewritten to the spec envelope shape.
- **Schedule bitmasks decoded in catalog views; worklet catalog projects its real fields (audit clusters 5 + 8).** (1) `policy_catalog` and `get_patch_tuesday_readiness` rows now carry `schedule_days_decoded` next to the raw `schedule_days` bitmask (reusing the decoder `describe_policy` already had), and both tool descriptions note that `schedule_time` is a bare HH:MM string with no timezone marker. (2) The worklet catalog projection read a `category` key that does not exist on live `/wis/search` items (verified 2026-06-05 — items carry plural `categories`), so every catalog row showed a null category; it now projects `categories` (with legacy-singular fallback) plus the trust/availability signals the projection silently dropped (`verified`, `access` tier, `license_required`, `language`, `version`, `device_type`). Worklet fixtures updated to the captured live shape.
- **Policy-run outcome counts and exit codes are now self-describing (audit clusters 3–4).** Live-verified 2026-06-05: (1) the v2 run records' `pending`/`success`/`failed`/`not_included`/`remediation_not_applicable`/`blocked` fields are **device counts per outcome**, not run statuses — bare keys like `success: 13` invited misreading, so `policy_runs_v2`/`policy_runs_for_policy`/`policy_history_detail` now group them under `device_outcomes` (and surface the previously-dropped `device_count` total). (2) `policy_run_detail_v2`'s per-device `exit_code` is the raw process exit code from the policy script — 0 = success, negative values on Windows are NTSTATUS as signed 32-bit ints (observed live: `-1073741502` = 0xC0000142 STATUS_DLL_INIT_FAILED) — now explained by a `metadata.field_notes` legend and the tool description. (3) `list_events` documents that `data.status` on policy/patch events is the same raw exit code, arriving as string **or** int.
- **`list_org_api_keys` no longer reports zero keys when the org has them, and `get_user`/`get_account_user` close two account-security projection gaps (re-audit cluster account-security; live-verified 2026-06-05).** (1) `list_org_api_keys` assumed `GET /orgs/{id}/api_keys` returns a bare list, but the endpoint returns a `{"results": [...], "size": N}` envelope (live-verified: `size=21`, 21 results) — the same envelope class as the 154 (`/approvals`) and 163 (`/data-extracts`) fixes — so a conforming response always collapsed to zero keys. The workflow now unwraps the envelope, projects the real per-key fields (`is_enabled` — the live DTO key, NOT the previously-read phantom `enabled` — plus `id`/`name`/`created_at`/`expires_at`), drops the embedded `user` contact blob to stay lean, and surfaces the envelope `size` as `metadata.total_size` for count reconciliation. The invented bare-list unit fixture (the 132 trap) is replaced with a captured sanitized `{results,size}` payload, and the smoke assertion was upgraded from `resp is not None` to envelope reconciliation (`total_keys == size`). (2) `get_user` re-projects each nested `orgs[]` object through an allowlist (`_USER_ORG_FIELDS`, mirroring `_ZONE_FIELDS`) so a populated per-org `access_key` (and `saml`/`metadata` config blobs) is never forwarded into model context; `plan` is kept so the existing `orgs[].plan` legend stays meaningful. (The `/users` endpoint is scope-gated on the audit key, so the no-`access_key` guarantee is implemented defensively rather than live-confirmed.) (3) The `get_account_user` `two_factor_authentication` field note and tool description were rewritten: the field returns the literal string `"disabled"` when 2FA is OFF (re-audit-observed live), so the prior framing ("a string value means 2FA of that type is configured") would have a model read `"disabled"` as a configured 2FA type — the opposite of the truth. The legend now enumerates `"disabled"` = off (live-verified), other strings (e.g. `email`/`google`, per spec) = configured type, and keeps the null/absent ambiguity note.
- **`audit_events_ocsf` no longer forces the model to decode raw OCSF plumbing (audit finding cluster 2).** Three changes, all live-verified 2026-06-05: (1) event `time` arrives from the upstream as **epoch seconds** (a float — despite the OCSF standard specifying milliseconds); the projection now converts it to an ISO 8601 UTC string, with a defensive milliseconds branch for values too large to be seconds. (2) `severity`/`status` string labels are filled in from the `severity_id`/`status_id` integer enums (spec `x-enumDescriptions`) when the upstream omits the strings — live events sometimes carry only the ids, and these OCSF integers mean something entirely different from the `/servers` policy-status enum (OCSF: 1=success, 2=failure). (3) The tool description documents both scales. Test fixture updated from invented ISO-string times to the captured live shape.
- **`get_device_by_uuid` no longer returns the raw `/servers/{id}` payload unexplained — it was the unfixed twin of the `device_detail` defects below.** The tool returned the same ServerWithPolicies DTO as `device_detail` but verbatim: integer policy codes with no legend, the unit-less (and spec-misdocumented) `uptime`, and no compliance projection. Found by a 12-domain projection audit; all three findings adversarially verified. The raw dict now passes through a shared enrichment (`enrich_raw_device_payload`): each integer policy code gains a `status_label` sibling (raw code kept for fidelity), `uptime` is replaced by `uptime_minutes`, and the device-level `compliance` rollup is attached. The tool description also no longer claims a "Server Groups API v2" lookup — it has used the canonical `/servers/{id}` endpoint since the 92 fix. Smoke now asserts every integer policy code carries a label and the unit-less key is absent.
- **Per-policy status codes in `device_detail` are now translated instead of passed through raw.** `GET /servers` `policy_status[].status` is an integer enum — `0 = needs_remediation`, `1 = up_to_date`, `2 = pending` (confirmed against the Console API spec and cross-checked live against the `status.policy_statuses[].compliant` booleans: code 1 is the only value paired with `compliant: true`). The summarizer previously stringified the integers verbatim (`"1"`/`"2"`), forcing the model to guess a mapping — and in observed use it guessed one that inverted compliant and non-compliant. Worse, code 0 (needs remediation — falsy) fell through a truthiness chain to absent alternate keys and surfaced as `"unknown"`, so the one state that demands action was the one reported least clearly. The mapping is applied in the policy-entry summarizer, not in the generic status normalizer, because other Automox enums reuse these integers with different meanings.
- **`uptime` renamed to `uptime_minutes` in `device_detail` — the public spec's unit is wrong.** The Console API spec describes `Server.uptime` as "measured in seconds", but live verification against known device boot times (2026-06-04) shows the value is **minutes**, sampled at the device's last full scan — so it can also lag the current boot session. The unit-less raw key invited bad inferences (an observed session read ~6.7k minutes ≈ 4.7 days as possibly "~9 months of uptime"). The projection now emits `uptime_minutes` as an integer, and the tool description carries the sampled-at-last-scan caveat.
- **Account- and server-group tools now label opaque / unit-silent fields so the model stops confabulating (audit cluster account-groups).** `list_server_groups` / `get_server_group` annotate `refresh_interval` as **minutes** via `metadata.field_notes` (live-verified 2026-06-05: console 24h/4h scan cadences read back as `1440`/`240`; spec range 240–1440), and the `create_server_group` / `update_server_group` write params now state the minute unit on the model-facing schema (previously a bare integer — an off-by-60× risk). `get_account_user` documents that a null/absent `two_factor_authentication` is **ambiguous** (per spec null means disabled, but that was not live-verifiable and the field is non-required — null/absent may mean disabled *or* not-reported) rather than implying a clean 2FA boolean. `get_user` notes that `orgs[].plan` is a spec-only billing slug (`basic`/`manage`/`tier3`, unverified live, absent on some tenants) and is a distinct vocabulary from `list_organizations`' `tier` — the model is told not to reconcile or rank them. `list_organizations` no longer advertises "feature-tier checks": its `tier` slug has no defined ordering and is absent on some tenants (live-verified: this tenant returns no `tier` field at all), so the description steers the model away from inferring a paid-plan / capability ranking. Projection values are unchanged (raw fidelity preserved); the fixes are description/legend-only.
- **`audit_events_ocsf` no longer silently returns zero events when filtered by category.** The upstream OCSF audit events carry no `category_name` field (verified live 2026-06-05 across every returned event) — only an integer `category_uid` that, live-confirmed, maps 1:N across categories (`category_uid=3` covers BOTH Authentication and Entity Management). The prior client-side `category_name` equality filter matched the absent field, so it zeroed every result and made a model conclude "no authentication/web-resource activity" on dates that had it (reproduced live: a `web_resource_activity` filter that should return 7 events returned 0). Category filtering is now applied against the event `type_name` prefix (the string the upstream actually populates: "Authentication:", "Entity Management:", "Web Resources Activity:" — colon+space boundary, live-verified; `account_change`/`user_access` prefixes are spec-derived and labeled unverified-live). An unmappable token no longer zeroes the result — it leaves events unfiltered and sets `metadata.applied_filters.category_name_matched=false`; `events_before_filter` is retained so an empty filtered result is distinguishable from no activity. The tool/parameter descriptions and a `metadata.field_notes` legend state all of this. The legend also notes that `category_uid`/`type_uid`/`class_uid`/`activity_id` are raw OCSF taxonomy integers with no decode table in the upstream spec (so `type_name`/`activity` are the reliable human-readable signals) and that the `date` parameter's timezone boundary is unspecified by the upstream. The unit fixture was rebuilt from a sanitized live capture (no invented `category_name` key; `category_uid=3` correctly shared by Authentication and Entity Management) so the test exercises the live contract rather than a fictional field.

- **`verify-publish` no longer loses the race to PyPI index propagation (CI only).** The v2.0.3 run failed despite a healthy publish, for two compounding reasons: uv caches simple-index responses, so after a too-early first attempt every retry replayed the cached "no such version" miss instead of re-checking PyPI; and the retry window (6×20s ≈ 2 min) was shorter than occasional index-propagation lag. The install check now passes `--refresh-package automox-mcp` and retries for up to ~15 minutes (30×30s; job timeout raised 10→20 min). The loop still exits on the first success, so a normal release pays nothing — the ceiling only spends free runner minutes on slow days, instead of attended minutes diagnosing and rerunning a red release.

2.0.3

Changed

- **ADS-family 403s are now self-diagnosing.** The upstream Server Groups v2 search endpoints return a bare `403` that is indistinguishable from an RBAC denial, so callers (and models) chase permissions instead of the usual fix. The search/saved-search/assignments workflows now append a hint to any 403 — global/account keys behave inconsistently on these endpoints while org-scoped keys work reliably; try an org-scoped key for the target org (zone Settings > Secrets & Keys) — at the exact failure point, with no configuration to declare or drift. The metadata endpoints (`get_searchable_fields`, `get_search_scopes`, `get_device_metadata_fields`) are deliberately excluded: they accepted either key type throughout, so a 403 there means something else. (A declared `KEY_SCOPE` env var was considered and rejected — users who don't know their key type would set it wrong, and it drifts on key rotation.)
- **README now explains the two API key types, not just which to pick.** Added an org-scoped vs global/account key comparison (scope, where each is created, tool coverage) to the credentials section, a symptom line (403 on the search family while reads work elsewhere usually means the key, not permissions), and a key-type hint in `.env.example` — so a holder of a global key can recognize it and mint the right one instead of just being told "use org-scoped."
- **Documented Advanced Device Search key behavior and the `TAGS` search scope.** Observed live (2026-06-03/04): global/account-scoped API keys behave **inconsistently** on the Server Groups API v2 search endpoints (`advanced_device_search`, `device_search_typeahead`, saved-search CRUD, `list_searches_for_device`, `get_device_assignments`) — a full-admin account key 403'd uniformly across all orgs on day one, then (untouched) worked in exactly one org a day later while still 403ing elsewhere — whereas a freshly-issued org-scoped key worked immediately and reliably. Mechanism unconfirmed upstream (an interim revision claimed the endpoints flatly "only accept org-scoped keys"; the observation matrix in `docs/api-coverage.md` replaces that overclaim). The README previously claimed "both global and org-scoped API keys work"; it now recommends an org-scoped key and lists the affected tools. The `advanced_device_search` description also now shows that tag search uses scope `TAGS` (not `DEVICE`): `{"scope": "TAGS", "field": "tag", "operator": "IN", "values": [...]}` — confirmed live against a known tag census.

2.0.2

Security

- **HTML sanitizer now suppresses the text content of elements carrying a dangerous attribute.** `_HTMLTextExtractor` detected `on*` event handlers and `javascript:`/`data:` URLs in `href`/`src`/`action` but took no action — it dropped the tag (as it does for any tag) while still emitting the element's inner text. The dangerous *value* never reached output (attributes are never emitted), so impact was limited to defence-in-depth, but the detection was effectively dead. The extractor now tracks suppression with a `(tag, skipping)` stack that correctly releases on the matching end tag, skips void elements (`<img>`, `<br>`, …) that have no end tag, tolerates unclosed inner tags, and relies on `HTMLParser` CDATA mode for `<script>`/`<style>`. (The obvious one-line `_skip_depth += 1` fix was rejected — it leaks skip state on non-`script`/`style` tags and self-closing tags, silently swallowing all trailing text.)

Fixed

- **`preview_policy_device_filters` mis-reported every result and 500'd on filter-only targets — both fixed.** Verified live (2026-06-03): (1) The endpoint returns a `{"results": [...], "size": N}` envelope, which `extract_list` didn't recognize — it wrapped the whole envelope as a single device and reported `total_devices: 1` for *any* result set (a 14-device group and its 6-device filtered subset both came back as "1"). The wrapper now parses `results`/`size` directly, falling back to the bare-list / `data` shapes. (2) A filter-only request (`device_filters` with no `server_groups`) and an empty request both return an opaque upstream HTTP 500; the wrapper now pre-empts them with actionable guidance ("pass the `server_groups` this policy targets…"). Confirmed live that `device_filters` **are** applied within a `server_groups` scope — a `tag` clause narrows a group's set to exactly the tagged devices — so previewing a tag-targeted policy works when the target groups are supplied. (This corrects an earlier note that the endpoint was simply broken: it works when `server_groups` is present.)
- **Policy `device_filters` no longer silently dropped — wrong-shape clauses are rejected, and structured filters are auto-enabled on all policy types.** Three issues, all verified against the live tenant (2026-06-03): (1) The model-facing guidance resources (`resource://filters/syntax` and the policy how-to) documented filters as `{"type": "tag", "tag_name": "..."}`, but the API uses `{"field", "op", "value"}` (fields `tag`/`ip_addr`/`hostname`/`os_family`/`os_version_id`/`serial_number`/`organizational_unit`, singular; ops `in`/`not_in`/`like_any`) and **rejects the legacy shape with HTTP 400** — every real stored policy uses the `{field, op, value}` form. Both resources now document the correct shape. (2) `_normalize_device_filters` forwarded any mapping clause verbatim without validation, so a wrong-shape filter reached the API as an opaque 400; it now validates clause shape locally and rejects the legacy `{type, tag_name}` form with an actionable message. (3) `device_filters_enabled` (which the API requires for filters to take effect — all live policies carry it `True`) was only auto-set for `patch` policies; structured filters on `custom`/worklet and `required_software` policies shipped with the flag unset, so a tag filter could be present-but-disabled and the policy would target every assigned group. The flag is now auto-set for all policy types when filters are present. (Previewing the resulting targeting: see the `preview_policy_device_filters` entry above.)
- **`docs/tool-reference.md` Table of Contents counts and anchors corrected.** The TOC listed Device Management as 9 tools (header says 11) and Vulnerability Sync as 7 (header says 9); the stale counts also broke the in-page anchor links. The section headers — validated against the registered tool set by `test_doc_tool_counts.py` — were already correct.
- **`create_policy`/`update_policy` schedule-days error message no longer advertises an unsupported range.** The "Unrecognized day name" message offered numeric indexes "0-6 or 1-7", but `_normalize_schedule_days_input` accepts 0–6 only (1–7 was intentionally removed for its ambiguous Monday/Sunday mapping). Dropped the "or 1-7" clause.

Changed

- **`preview_policy_device_filters` description and smoke coverage now reflect the verified contract.** The tool description and `docs/tool-reference.md` state that `server_groups` is required when `device_filters` is provided (pass the groups the policy targets; clauses apply within that scope). The smoke suite's preview check now asserts correctness instead of "got a response": the reported `total_devices` must match the returned device list (catches the envelope-wrapped-as-one-device regression), and a filter for a nonexistent tag must narrow the group scope to exactly 0 devices (catches a silently-ignored filter).
- **Code-quality cleanup (CodeQL + AI code-scanning findings), no behavior change.** Removed unused module-level `logger` bindings across 18 tool modules and a dead instruction-strip allowlist superseded by the preserve-list denylist; consolidated redundant/duplicate imports (`json` hoisted in `schemas.py`, `asynccontextmanager` in `server.py`, dead local re-imports in `utils/tooling.py` and `workflows/devices.py`); parenthesized two implicitly-concatenated multi-line strings inside reference list literals to make intent explicit. Added a regression test for the `list_device_packages` auto-pagination safety cap (`_MAX_PACKAGE_PAGES`) — the `metadata.complete = False` truncation signal was previously unverified.

Page 1 of 9

© 2026 Safety CLI Cybersecurity Inc. All Rights Reserved.