**The projection-audit release.** Every model-facing tool output went through a 12-domain audit (54 adversarially-verified findings), an independent re-audit that live-verified each fix against the production API and caught 23 further defects, and a live verification campaign that upgraded dozens of field legends from spec-assumed to live-verified provenance — 24 PRs in total. The tool set is unchanged (133 tools); the theme is **self-describing output**: raw values are preserved for fidelity, with decoded siblings and `metadata.field_notes` legends stating verified units, vocabularies, and semantics, so a model no longer has to guess what an integer status, a unit-less number, or a bare enum string means. No tool signatures were removed or narrowed; some output shapes changed (phantom keys that never carried data were dropped, misdocumented fields renamed, flat counts regrouped) — each is detailed below. Where the audit found the *upstream* spec or API at fault, the affected legend says so explicitly rather than papering over it.
Added
- **`device_detail` now emits a device-level `compliance` rollup.** Counts per policy state plus the named policies in `needs_remediation`, and a note encoding the rule (verified live across ~20 devices): a device is non-compliant when at least one policy needs remediation — pending policies alone do not count against compliance. Previously the model had to derive all of this from raw per-policy entries.
Fixed
- **Approvals decision-axis and remediation-status legends corrected/upgraded from a device-bearing test-org verification (issue 165; live-verified 2026-06-06).** Legend/description text plus one tiny projection fallback — no tool-count changes. (1) **Approvals decision axis.** On a 12-row device-bearing approvals queue, `status` on a DECIDED row carries the decision OUTCOME, not an execution status: `manual_approval=true` co-occurred 1:1 with `status='approved'` (8/8) and `false` with `status='rejected'` (4/4), and `manual_approval_time` is a `'YYYY-MM-DD HH:MM:SS'` string. The earlier claim that an awaiting row carries `status='active'` is removed from the `get_patch_tuesday_readiness` comment, the `patch_approvals_summary` legend, and the test fixtures — an awaiting row could not be generated in the verification cycle, so status-while-awaiting stays unobserved and is no longer asserted; the pending count remains keyed on `manual_approval is None`, which is now better supported (status is the decision word, so keying pending on status would be wrong regardless). The record has no top-level `severity` (12/12) and the envelope is `{results, size}`. (2) **`patch_approvals_summary` nested-severity fallback.** The live record carries a nested `software.severity` key (null on every observed row, but present in the shape); `summarize_patch_approvals` read only the absent top-level severity, so a populated nested value would be missed. It now falls back to `software.severity` before bucketing to `unspecified`. (3) **Remediation status vocabulary.** A full gated patch-now execution observed `devices[].status` transition `not-started -> in_progress` (per-device and execution-scoped — an untargeted sibling device stayed `not-started`), with `in_progress` emitted within seconds of the 202; the legend now calls out the SEPARATOR INCONSISTENCY (`not-started` hyphenated, `in_progress` underscored), notes the spec's `pending` example was not seen live, and keeps the terminal value open (the device held `in_progress` for a ~24-minute poll without a captured terminal string). `solutions[].status` is confirmed null/absent on automox-patch across the lifecycle (status lives on `devices[]`), and patch-now does NOT create a persistent policy — it dispatches a direct `InstallUpdate` device command (`policy_id=0`).
- **Policy-window and action-set status legend provenance upgraded from live re-verifications (issue 165; live-verified 2026-06-06).** Legend/description text only — no projection logic or tool-count changes. (1) **`use_local_tz` / `dtstart`.** The prior "use_local_tz=true semantics are per spec, unverified live (the probe used false)" framing is replaced with the live facts: the upstream REJECTS `use_local_tz=true` with HTTP 400 (`invalidFields.useLocalTz="use_local_tz cannot be set to true"`, unconditional of recurrence on this tenant — possibly tenant/plan-conditional, not asserted universal), so `false` is the only persistable value here; `dtstart` is echoed verbatim with no normalization; and the entity carries no timezone-resolution field, so the true-case "same wall-clock in each device's local timezone" meaning is spec-only AND device-side — structurally unprovable from the API entity even where the flag is allowed. (2) **`duration_minutes`.** New note: it may be recomputed upstream for `recurrence=once` windows — live, sent 30 but create AND get echoed 389494 (≈ the dtstart→UNTIL span); the wrapper passes the value verbatim, so callers must read back the echoed value rather than trusting their input. (3) **`get_action_set_detail.status`.** A full upload-to-completion poll observed the lifecycle `building -> ready` settling in ~2s; `'building'` is emitted by the API itself in the 201 create response (not a wrapper default), `'ready'` is the confirmed terminal value, and `'active'` was NEVER observed live (spec example only) — replacing the earlier claim that mixed the spec example with live values.
- **`get_data_extract` now accepts an integer `extract_id` (re-audit release blocker; live-verified 2026-06-05).** Live extract ids are integers (`list_data_extracts` returns e.g. `479737`), so a model relaying an id from the list call passed an int and the tool rejected it at the parameter boundary with a pydantic `string_type` error before any request was made. The tool parameter is widened to `int | str` and coerced to a string for the URL path; `GetDataExtractParams.extract_id` gains a before-validator that coerces an int to str (the path-safety regex still applies; a bool is not treated as an int). Verified end-to-end through the FastMCP tool boundary: an int id now returns the correct record. The `list_data_extracts` smoke check was upgraded from `resp is not None` to an envelope-unwrap reconciliation — `total_extracts` must be the integer envelope `size`, every returned row must be an unwrapped extract dict carrying an `id` (never the leaked `{results,size}` envelope), and `total_extracts >= page_row_count` (size is the cross-page grand total) — so a regressed unwrap that left the whole envelope as one bogus row would now fail instead of passing on a 200.
- **Device-surface compliance and severity legends reconciled with the authoritative signals (re-audit cluster device-surfaces; live-verified 2026-06-05).** (N2) The legacy device-level policy-status string (e.g. `"non-compliant"`) can contradict the authoritative `compliant` boolean: live, a device with `compliant=true` and 13 pending policies still reports the string `"non-compliant"`, and fleet-wide the legacy axis read 175 non-compliant while the authoritative axis read 129. The raw strings are kept (no rename/drop), but each contradicting surface now carries a reconciling `metadata.field_notes` entry pointing to the authoritative rollup/boolean: `device_detail.core.status` → `compliance.device_compliant`; `list_devices.devices[].policy_status` → `device_detail`/`device_health_metrics`; `device_health_metrics.policy_execution_breakdown` → `compliance_breakdown`. (N10) `device_detail`'s two per-policy breakdowns — `compliance.policy_status_counts` (from `policy_status[]`) and `policy_assignments.status_breakdown` (from `server_policies[]`) — are computed from different upstream arrays and can differ by a policy or two (live: 31 vs 30); the `status_breakdown` legend now explains the two source arrays so the model knows which to cite. (N4) `search_devices`'s description and `docs/tool-reference.md` dropped the false guarantee that "every returned device has at least one missing patch at that severity" (false on ~41% of live `critical` results) while keeping the correct `pending_patches`-scoping disclaimer. (N5) ADS/saved-search device results (`advanced_device_search`, `get_saved_search_results`, `get_cached_search_results`, `run_saved_search`) now carry a `metadata.field_notes.outstanding_patch_severity` legend (and `advanced_device_search`'s description states it) distinguishing the string `'none'` (assessed, no outstanding patches — clean) from JSON null/absent (device not yet assessed — unknown, NOT clean); live distribution confirmed both states present. All legend-only — raw values pass through unchanged; no tool counts change.
- **Policy-domain projection re-audit fixes (audit; live-verified or spec-attributed 2026-06-05).** Seven items from the re-audit final list. (1) **banner_stats unit legend** — `policy_history_detail` and `policy_runs_for_policy` now attach a `metadata.field_notes` legend stating that `banner_stats.policy_success_rate` is a percentage in the 0–100 range, NOT a 0–1 fraction (live-verified; observed values `0.0`/`60.0`/`100.0`), and that `total_policies_applied`/`total_successful_devices` are plain counts — so the model stops reading the rate as a fraction or a count (baseline finding 22). (2) **`get_patch_tuesday_readiness` pending-approval count** — the count keyed on `status in ('pending','Pending')`, but the decision axis is `manual_approval` (per 154: True=approved, False=rejected, null=awaiting decision); the count is re-keyed on `manual_approval is None`, and the compound test fixtures that encoded the wrong `status:'pending'` assumption (the 132 trap) were corrected (baseline finding 51). The exact `status` value an awaiting row carries remains unobserved live — the later issue-165 verification (see above) found `status` holds the decision outcome `approved`/`rejected` on decided rows and withdrew any specific awaiting-value claim. (3) **`policy_catalog` phantom `next_run`** — the projection read `next_run`, a key the `/policies` list endpoint never returns (confirmed absent live), emitting a confident `next_run:null` that read as "nothing scheduled"; it is removed and replaced by the spec-defined `next_remediation`, surfaced only when present, with a legend warning that absence does not mean unscheduled (per spec, not observed live on this tenant) (finding N8). (4) **`policy_runs_v2.result_status` semantics** — the description now states the filter is any-device-with-this-outcome (live-verified: `result_status='failed'` returns runs with 1 failed device alongside 200+ not-failed, and a run can match multiple statuses) (finding N7). (5) **`search_policy_windows(recurrences=)` case** — the spec declares an UPPERCASE enum; tokens are now coerced to uppercase before sending (case-insensitive accept) so a lowercase token can no longer silently match zero windows (spec-attributed, not live-verified — tenant had 0 windows) (finding N11). (6) **`create_policy_window` dtstart description** — the `use_local_tz=true` case is now qualified as per spec/unverified-live (the controlled-object probe used `use_local_tz=false`), matching the workflow legend's provenance discipline. (7) **policy-history test fixture** — the `_POLICY_RUNS` fixture gained the `device_count` field the projection surfaces and the legend depends on (captured shape: per-outcome device counts sum to `device_count`, live-verified across 8 runs), with an assertion exercising the sum.
- **Re-audit legend/projection corrections across packages, worklets, audit-v2, vuln-sync, and the patch-readiness compound (re-audit 2026-06-05; all read-side items live-re-verified).** (1) **Package `severity` legend.** `low` was reclassified from `spec_only_unverified` to `observed_live` in the shared `_SEVERITY_FIELD_NOTE` and both package tool descriptions — a fresh org-package probe returned `low` in the live severity distribution (spec-only set is now `none`/`unknown`). (2) **`search_org_packages` page-scoped count.** The endpoint returns a bare list with no total, so the prior `total_packages` (which fell back to the page length, e.g. 500, while >1000 packages exist) was relabeled `returned_package_count` and accompanied by `metadata.pagination` (`page`/`page_size`/`upstream_total`/`has_more`) and a field-note — a full page now signals more pages instead of masquerading as a fleet-wide total. (3) **Worklet detail safety flag.** `user_interaction_required` (a live-present unattended-execution safety flag, observed on 10/60 worklets) was added to the `get_worklet_detail` projection allowlist; it had been silently dropped. (4) **`audit_events_ocsf` event identifier.** The projection read a non-existent top-level `uid` (always null); it now surfaces the real `_id` plus `metadata.uid`. (5) **`investigate_noncompliant_device` prompt.** Rewrote the step that told the model to filter packages on the phantom `patch_status` field (removed by the 159 projection fix) to use the real `installed` boolean and `severity`. (6) **`get_patch_tuesday_readiness` severity legend.** Aligned the compound description to the full value set the prepatch projection emits (`critical/high/medium/low/none/no_known_cves/unknown`) and corrected the false "null when unrated" claim — the projection emits the string `unknown`, never JSON null. (7) **`get_action_set_detail` status provenance.** The legend had claimed `active`/`ready`/`building` were all "observed live"; only `ready` was observed on this tenant (`active` is the spec example, `building` is the wrapper's upload default) — provenance is now stated honestly per value. (8) **`user_access` audit-category provenance.** Corrected the comment/legend that lumped `account_change` and `user_access` together as "spec-example derived": `account_change` has a spec example, `user_access` does not (it is an inference) — labeled accordingly. Unit fixtures corrected to captured live shapes (worklet `device_type` is a list `['SERVER','WORKSTATION']`, not the scalar `"endpoint"`; OCSF event identifier is `_id`+`metadata.uid`, not top-level `uid`); smoke upgraded to assert the page-scoped count and truncation signal on `search_org_packages`.
- **`device_detail` and `get_device_full_profile` stopped surfacing phantom keys; `devices_needing_attention` and `search_devices` got real triage signal (audit; live-verified 2026-06-05).** (1) `software_preview` dropped an always-null `status` field (no such key exists on `/servers/{id}/packages` items) and now projects the real per-package signals — `installed`, `ignored`, `severity`, `agent_severity`, `cve_score`, `cves`, `requires_reboot`, `deferred_until`. Observed severity vocab: `critical`/`high`/`no_known_cves`/null; `agent_severity` is surfaced raw (per spec it may be a text severity or a numeric CVSS score). (2) `pending_commands` was reading three keys that don't exist on the queue item (`command`/`scheduled_time`/`status` — every queued command came back null); it now maps the live Command fields (`command_type` from `command_type_name`, `scheduled_time` from `exec_time`, plus `policy_id`/`args`/`response`). The command vocabulary is non-exhaustive (kept raw). (3) `policy_assignments.status_breakdown` was emitting raw integer codes (`{"1": 17, "2": 13}`) with no legend; a per-policy crosstab confirmed `server_policies[].status` is the same `0 needs_remediation` / `1 up_to_date` / `2 pending` enum as `policy_status[].status` (all three values observed live), so it now decodes to labels. Server groups were a list of integer IDs that the projection tried to read as `.name` objects (always empty); they now surface as `server_group_ids`. (4) `devices_needing_attention` now carries the per-policy `severity`, failure `reason`, and `policy_create_time` the report DTO exposes (present on every live entry; observed severities unknown/critical/high) so triage no longer loses the priority signal. (5) `search_devices` documents that per-device `pending_patches` is the all-severity outstanding total, not the severity-filtered subset. The compound `get_device_full_profile` forwards the corrected `pending_commands`/`policy_assignments` verbatim, so it is fixed by the same projection change. Unit fixtures rewritten from the live (sanitized) shapes (integer policy status, integer group IDs, no package `status` key, `command_type_name`/`exec_time` queue items).
- **Reports tools distinguish patch severity states and explain compliance (audit cluster reports).** Live-verified 2026-06-05. (1) `prepatch_report` no longer collapses `no_known_cves` (patches carry no associated CVE — benign) into `unknown` (severity undetermined): the per-device `highest_severity` keeps them as separate values, and the wrapper's recomputed `summary` now has a distinct `no_known_cves` bucket (the raw upstream summary already carried separate `no_known_cves`/`unknown` counters and is still passed through verbatim under `api_summary` — only the per-device projection and the recomputed summary had been folding them). A `metadata.field_notes` legend plus the tool description state the vocabulary. (2) `prepatch_report` documents that its `compliant` boolean follows the platform rule (a device is non-compliant only when a policy needs remediation; pending work alone does not count against compliance, consistent with 149/155) — the boolean is the upstream device value, passed through raw, so the model no longer treats a `compliant:true` device with pending patches as a contradiction. (3) `noncompliant_report` now surfaces each failing policy's `reason_for_fail` (upstream failure text, truncated past 2000 chars), `severity`, `type`, and a `package_count` instead of only id+name, so the model can state why a device is non-compliant and prioritize (live-verified these fields are populated upstream). (4) Clarified that `prepatch_report.total_pending_patches` is an upstream-reported relabel whose unit the spec does not state and whose per-severity buckets are not guaranteed to sum to it.
- **Policy-execution timeline and per-device run results are now self-describing (audit clusters; live-verified 2026-06-05).** (1) `policy_execution_timeline` previously labeled any run with no successes and no failures as status `unknown`/null and dropped the `remediation_not_applicable` (and `blocked`) device counts entirely — so a run that completed as an all-not-applicable no-op gave the model zero signal and read as ambiguous or failed. Each execution now groups its `pending`/`success`/`failed`/`not_included`/`remediation_not_applicable`/`blocked` values under `device_outcomes` (device counts per outcome, not run statuses — they sum to `device_count`), and a `metadata.field_notes` legend plus the tool description state that a run with no successes or failures but nonzero pending/not-included/not-applicable is a benign no-op or still in progress, not an error. (2) `policy_health_overview`'s run-level `status_breakdown` catch-all bucket is renamed from the opaque `unknown` to `no_success_or_failure` and legended for the same reason. (3) `policy_run_results` now carries the same `metadata.field_notes` exit-code legend already shipped for `policy_run_detail_v2`: `exit_code` is the raw process exit code (0 = success; negative values on Windows are NTSTATUS codes as signed 32-bit ints, e.g. `-1073741502` = 0xC0000142 STATUS_DLL_INIT_FAILED; when null it falls back to the Automox internal `error_code`, a different namespace), and `result_status` is a lowercase per-device outcome string. Raw values are preserved for fidelity; no tool counts change.
- **Package tools no longer advertise or emit phantom fields, and `severity` now carries a vocabulary legend (audit cluster: packages).** Live re-probe 2026-06-05 confirmed, per endpoint, that the `/servers/{id}/packages` and `/orgs/{id}/packages` responses each lack `status`/`patch_status` and `awaiting` (checked across 800+ org packages and the device inventory). `list_device_packages` was forwarding `pkg.get("status") or pkg.get("patch_status")` — a falsy-drop chain over fields that never exist — and its description claimed it "Returns ... patch status"; `search_org_packages` projected a nonexistent `awaiting` output key. Both phantom projections are removed: install state is conveyed by the real `installed` boolean, and `awaiting` remains a correct request-only filter (per spec: `awaiting=1` = available-but-not-installed, `awaiting=0` = installed). The same probe also showed `device_count` is absent from the live org Packages DTO (the org endpoint returns the same per-package shape as the device endpoint, not an aggregated rollup), so the dead `device_count` projection on `search_org_packages` was removed as well. Both tools now attach a `metadata.field_notes.severity` legend and the descriptions state the verified vocabulary: critical/high/medium/no_known_cves and JSON null observed live, with low/none/unknown present in the spec enum but unobserved on the probed tenant — JSON null means no severity assessment was recorded (not a safety claim) and no_known_cves means scanned with no known CVEs, so the model stops guessing the difference. Unit fixtures rewritten from invented shapes to captured (sanitized) live DTOs.
- **Vuln-sync remediation tools now label the coded status/severity strings the model previously received bare (audit findings 35–37).** `get_action_set_solutions` gains a `metadata.field_notes` legend for the three coded fields the API types as bare strings with no enum. Per-vulnerability `severity`: `critical` was observed live, consistent with the lowercase patch-severity scale topping at `critical` (not the capitalized OCSF Low/High/Critical/Fatal scale), but the full value set and ceiling were not confirmed, so the scale stays spec-derived/unverified-live and the model is told to treat severity as the source-reported rating and cross-check CVSS rather than assert a ceiling. Per-device `status`: observed live as `not-started` (the spec's only example, `pending`, was not seen live); the legend reports the observed value and flags the value space as open rather than inventing siblings. Solution-level `status` (defined in the spec for the rapid7/unmatched sub-types) was not present on the live rapid7 solutions, so the legend marks it spec-defined/not-observed-live. `get_action_set_detail` gains a `status` legend stating the lifecycle vocabulary observed live (active/ready/building) while flagging that the spec defines no enum and the terminal value is unconfirmed. Tool descriptions updated to point at the legends; the prior invented `_SOLUTIONS` unit-test fixture is replaced with a sanitized live capture (rapid7-solution shape) plus a spec-derived automox-patch entry for sub-type coverage. Raw values pass through verbatim — legend-only, no projection mutation.
- **Policy-window projections now label verified vocabularies and units instead of forwarding ambiguous raw values (audit clusters 10–11).** `create_policy_window`/`update_policy_window` no longer claim generic "RFC 5545 RRULE" — the descriptions and the `rrule` parameter now state the validator-enforced grammar (recurrence=once requires `FREQ=DAILY;UNTIL=YYYYMMDDTHHMMSSZ`; recurrence=recurring requires `FREQ=YEARLY`+`BYMONTH`+`BYDAY` only; `FREQ=WEEKLY` is rejected with a 400), preventing the model from generating rrules the upstream refuses. `get_policy_window`/`search_policy_windows` now emit a `metadata.field_notes` legend documenting that `status` is lowercase `active`/`inactive` and `recurrence` is UPPERCASE `ONCE`/`RECURRING` on read (while create/update accept lowercase), and that `dtstart`'s trailing `Z` is literal UTC only when `use_local_tz=false` — when `use_local_tz=true` the same wall-clock is applied in each device's local timezone, so the model no longer misreads the `Z` as UTC. The scheduled-windows tools now note that `start`/`end` are derived occurrence times (the window stores `dtstart`+`duration_minutes`+`rrule`, with no stored start/end). `search_policy_windows` pagination metadata now preserves a genuine zero count (`total_elements`/`total_pages`, and `has_more` when `page`/`size` are supplied) instead of dropping it on an empty result set via a falsy-`0` `or`-chain. All vocab/unit claims live-verified 2026-06-05 via an authorized controlled-object oracle; the `use_local_tz=true` timezone semantics are labeled spec-only/unverified.
- **Data-extract tools surfaced fictional or undefined fields and mis-read the API's response shape (audit).** `list_data_extracts` now unwraps the `{"results": […], "size": N}` envelope the API actually returns (live-verified 2026-06-05, same class as the 154 approvals fix) — previously the whole envelope was wrapped as one bogus row, so every list returned a single junk entry — and drops the phantom `name`/`file_size` keys that never existed on the record. `get_data_extract` now reads the real `download_expires_at` key (it was reading a non-existent `expires_at`, so link-expiry never populated) and drops phantom `file_size`/`row_count`. Both now surface the `is_completed` boolean as the reliable readiness oracle and label `status` against the spec enum (`queued|running|complete|failed|canceled|expired`). `create_data_extract` no longer invents the out-of-enum status `"pending"` and handles the array response the spec documents (per spec; the POST is a write so this path is unverified live, with a single-mapping fallback retained). A `metadata.field_notes` legend records that `has_download_url` (not a set `download_expires_at`) is the reliable can-download signal — on the expired records observed live the download link was gone (`download_url` null) regardless of the expiry timestamp. Fixtures replaced with the sanitized live shape (integer ids, the envelope, real keys).
- **Splashtop status tools now document their booleans (audit).** `splashtop_device_status` states that `installation_status` and `registration_status` are independent (a device can be installed but not yet registered) — attributed to the spec `DeviceStatusResponseDto` since installing Splashtop to verify the derivations would be a write, and added as a `metadata.field_notes` legend beside the verbatim payload. `splashtop_session_status` states `can_start_new_session` reflects capacity only (true when `current_sessions < max_sessions`, per the spec examples) and that attended-access consent is a separate precondition — so the field alone does not mean a session can begin.
- **Compliance semantics corrected in two aggregate views (audit; semantics confirmed with maintainer).** (1) `device_health_metrics` counted any device with `pending: true` as non-compliant — contradicting the platform rule verified in the 2.0.x compliance work (a device is non-compliant only when at least one policy needs remediation). It now trusts the upstream `compliant` boolean, exposes the full split as `compliance_breakdown`, and tracks pending work separately as `devices_with_pending_policies`. (2) `policy_compliance_stats` dropped the `pending` count that `/policystats` returns (live-verified 2026-06-05), so a policy with 2 compliant / 0 noncompliant / 166 pending read as "100% compliant over 2 devices". Pending now rides along per policy and overall (`pending_devices`, `pending_rate_percent` over all targeted devices); `compliance_rate_percent` stays computed over evaluated devices but is **null** (not a misleading 0%) when nothing has been evaluated, and `metadata.rate_semantics` states the rule. `policy_type` surfaced per row. Fixture updated to the live key shape (`noncompliant`, `pending`).
- **`patch_approvals_summary` no longer silently reports zero approvals — every approval was being dropped by an envelope mismatch.** `/approvals` returns a `{"size": N, "results": [...]}` wrapper (per `components/schemas/Approvals`; envelope confirmed live 2026-06-05), but the workflow required a bare Sequence, so a conforming response always became `[]` — the tool (and `get_patch_tuesday_readiness`, which consumes it) reported an empty approvals queue regardless of reality. Same class as the 132 envelope bugs, hidden by a fixture invented in the code's own image. Also fixed in the same pass: the per-approval projection read five keys that don't exist on the documented record (`title`/`severity`/`device_count`/`created_at`/`deadline` — every value was null); it now projects the real shape (`title` from `software.display_name`, `software.version`/`os_family`, CVE ids, the policy block, and `manual_approval`: true = approved / false = rejected / null = awaiting decision). The upstream record has no severity field, so `severity_breakdown` buckets those as `unspecified` instead of conflating them with a literal "unknown". Fixtures rewritten to the spec envelope shape.
- **Schedule bitmasks decoded in catalog views; worklet catalog projects its real fields (audit clusters 5 + 8).** (1) `policy_catalog` and `get_patch_tuesday_readiness` rows now carry `schedule_days_decoded` next to the raw `schedule_days` bitmask (reusing the decoder `describe_policy` already had), and both tool descriptions note that `schedule_time` is a bare HH:MM string with no timezone marker. (2) The worklet catalog projection read a `category` key that does not exist on live `/wis/search` items (verified 2026-06-05 — items carry plural `categories`), so every catalog row showed a null category; it now projects `categories` (with legacy-singular fallback) plus the trust/availability signals the projection silently dropped (`verified`, `access` tier, `license_required`, `language`, `version`, `device_type`). Worklet fixtures updated to the captured live shape.
- **Policy-run outcome counts and exit codes are now self-describing (audit clusters 3–4).** Live-verified 2026-06-05: (1) the v2 run records' `pending`/`success`/`failed`/`not_included`/`remediation_not_applicable`/`blocked` fields are **device counts per outcome**, not run statuses — bare keys like `success: 13` invited misreading, so `policy_runs_v2`/`policy_runs_for_policy`/`policy_history_detail` now group them under `device_outcomes` (and surface the previously-dropped `device_count` total). (2) `policy_run_detail_v2`'s per-device `exit_code` is the raw process exit code from the policy script — 0 = success, negative values on Windows are NTSTATUS as signed 32-bit ints (observed live: `-1073741502` = 0xC0000142 STATUS_DLL_INIT_FAILED) — now explained by a `metadata.field_notes` legend and the tool description. (3) `list_events` documents that `data.status` on policy/patch events is the same raw exit code, arriving as string **or** int.
- **`list_org_api_keys` no longer reports zero keys when the org has them, and `get_user`/`get_account_user` close two account-security projection gaps (re-audit cluster account-security; live-verified 2026-06-05).** (1) `list_org_api_keys` assumed `GET /orgs/{id}/api_keys` returns a bare list, but the endpoint returns a `{"results": [...], "size": N}` envelope (live-verified: `size=21`, 21 results) — the same envelope class as the 154 (`/approvals`) and 163 (`/data-extracts`) fixes — so a conforming response always collapsed to zero keys. The workflow now unwraps the envelope, projects the real per-key fields (`is_enabled` — the live DTO key, NOT the previously-read phantom `enabled` — plus `id`/`name`/`created_at`/`expires_at`), drops the embedded `user` contact blob to stay lean, and surfaces the envelope `size` as `metadata.total_size` for count reconciliation. The invented bare-list unit fixture (the 132 trap) is replaced with a captured sanitized `{results,size}` payload, and the smoke assertion was upgraded from `resp is not None` to envelope reconciliation (`total_keys == size`). (2) `get_user` re-projects each nested `orgs[]` object through an allowlist (`_USER_ORG_FIELDS`, mirroring `_ZONE_FIELDS`) so a populated per-org `access_key` (and `saml`/`metadata` config blobs) is never forwarded into model context; `plan` is kept so the existing `orgs[].plan` legend stays meaningful. (The `/users` endpoint is scope-gated on the audit key, so the no-`access_key` guarantee is implemented defensively rather than live-confirmed.) (3) The `get_account_user` `two_factor_authentication` field note and tool description were rewritten: the field returns the literal string `"disabled"` when 2FA is OFF (re-audit-observed live), so the prior framing ("a string value means 2FA of that type is configured") would have a model read `"disabled"` as a configured 2FA type — the opposite of the truth. The legend now enumerates `"disabled"` = off (live-verified), other strings (e.g. `email`/`google`, per spec) = configured type, and keeps the null/absent ambiguity note.
- **`audit_events_ocsf` no longer forces the model to decode raw OCSF plumbing (audit finding cluster 2).** Three changes, all live-verified 2026-06-05: (1) event `time` arrives from the upstream as **epoch seconds** (a float — despite the OCSF standard specifying milliseconds); the projection now converts it to an ISO 8601 UTC string, with a defensive milliseconds branch for values too large to be seconds. (2) `severity`/`status` string labels are filled in from the `severity_id`/`status_id` integer enums (spec `x-enumDescriptions`) when the upstream omits the strings — live events sometimes carry only the ids, and these OCSF integers mean something entirely different from the `/servers` policy-status enum (OCSF: 1=success, 2=failure). (3) The tool description documents both scales. Test fixture updated from invented ISO-string times to the captured live shape.
- **`get_device_by_uuid` no longer returns the raw `/servers/{id}` payload unexplained — it was the unfixed twin of the `device_detail` defects below.** The tool returned the same ServerWithPolicies DTO as `device_detail` but verbatim: integer policy codes with no legend, the unit-less (and spec-misdocumented) `uptime`, and no compliance projection. Found by a 12-domain projection audit; all three findings adversarially verified. The raw dict now passes through a shared enrichment (`enrich_raw_device_payload`): each integer policy code gains a `status_label` sibling (raw code kept for fidelity), `uptime` is replaced by `uptime_minutes`, and the device-level `compliance` rollup is attached. The tool description also no longer claims a "Server Groups API v2" lookup — it has used the canonical `/servers/{id}` endpoint since the 92 fix. Smoke now asserts every integer policy code carries a label and the unit-less key is absent.
- **Per-policy status codes in `device_detail` are now translated instead of passed through raw.** `GET /servers` `policy_status[].status` is an integer enum — `0 = needs_remediation`, `1 = up_to_date`, `2 = pending` (confirmed against the Console API spec and cross-checked live against the `status.policy_statuses[].compliant` booleans: code 1 is the only value paired with `compliant: true`). The summarizer previously stringified the integers verbatim (`"1"`/`"2"`), forcing the model to guess a mapping — and in observed use it guessed one that inverted compliant and non-compliant. Worse, code 0 (needs remediation — falsy) fell through a truthiness chain to absent alternate keys and surfaced as `"unknown"`, so the one state that demands action was the one reported least clearly. The mapping is applied in the policy-entry summarizer, not in the generic status normalizer, because other Automox enums reuse these integers with different meanings.
- **`uptime` renamed to `uptime_minutes` in `device_detail` — the public spec's unit is wrong.** The Console API spec describes `Server.uptime` as "measured in seconds", but live verification against known device boot times (2026-06-04) shows the value is **minutes**, sampled at the device's last full scan — so it can also lag the current boot session. The unit-less raw key invited bad inferences (an observed session read ~6.7k minutes ≈ 4.7 days as possibly "~9 months of uptime"). The projection now emits `uptime_minutes` as an integer, and the tool description carries the sampled-at-last-scan caveat.
- **Account- and server-group tools now label opaque / unit-silent fields so the model stops confabulating (audit cluster account-groups).** `list_server_groups` / `get_server_group` annotate `refresh_interval` as **minutes** via `metadata.field_notes` (live-verified 2026-06-05: console 24h/4h scan cadences read back as `1440`/`240`; spec range 240–1440), and the `create_server_group` / `update_server_group` write params now state the minute unit on the model-facing schema (previously a bare integer — an off-by-60× risk). `get_account_user` documents that a null/absent `two_factor_authentication` is **ambiguous** (per spec null means disabled, but that was not live-verifiable and the field is non-required — null/absent may mean disabled *or* not-reported) rather than implying a clean 2FA boolean. `get_user` notes that `orgs[].plan` is a spec-only billing slug (`basic`/`manage`/`tier3`, unverified live, absent on some tenants) and is a distinct vocabulary from `list_organizations`' `tier` — the model is told not to reconcile or rank them. `list_organizations` no longer advertises "feature-tier checks": its `tier` slug has no defined ordering and is absent on some tenants (live-verified: this tenant returns no `tier` field at all), so the description steers the model away from inferring a paid-plan / capability ranking. Projection values are unchanged (raw fidelity preserved); the fixes are description/legend-only.
- **`audit_events_ocsf` no longer silently returns zero events when filtered by category.** The upstream OCSF audit events carry no `category_name` field (verified live 2026-06-05 across every returned event) — only an integer `category_uid` that, live-confirmed, maps 1:N across categories (`category_uid=3` covers BOTH Authentication and Entity Management). The prior client-side `category_name` equality filter matched the absent field, so it zeroed every result and made a model conclude "no authentication/web-resource activity" on dates that had it (reproduced live: a `web_resource_activity` filter that should return 7 events returned 0). Category filtering is now applied against the event `type_name` prefix (the string the upstream actually populates: "Authentication:", "Entity Management:", "Web Resources Activity:" — colon+space boundary, live-verified; `account_change`/`user_access` prefixes are spec-derived and labeled unverified-live). An unmappable token no longer zeroes the result — it leaves events unfiltered and sets `metadata.applied_filters.category_name_matched=false`; `events_before_filter` is retained so an empty filtered result is distinguishable from no activity. The tool/parameter descriptions and a `metadata.field_notes` legend state all of this. The legend also notes that `category_uid`/`type_uid`/`class_uid`/`activity_id` are raw OCSF taxonomy integers with no decode table in the upstream spec (so `type_name`/`activity` are the reliable human-readable signals) and that the `date` parameter's timezone boundary is unspecified by the upstream. The unit fixture was rebuilt from a sanitized live capture (no invented `category_name` key; `category_uid=3` correctly shared by Authentication and Entity Management) so the test exercises the live contract rather than a fictional field.
- **`verify-publish` no longer loses the race to PyPI index propagation (CI only).** The v2.0.3 run failed despite a healthy publish, for two compounding reasons: uv caches simple-index responses, so after a too-early first attempt every retry replayed the cached "no such version" miss instead of re-checking PyPI; and the retry window (6×20s ≈ 2 min) was shorter than occasional index-propagation lag. The install check now passes `--refresh-package automox-mcp` and retries for up to ~15 minutes (30×30s; job timeout raised 10→20 min). The loop still exits on the first success, so a normal release pays nothing — the ceiling only spends free runner minutes on slow days, instead of attended minutes diagnosing and rerunning a red release.