Adds an Anthropic-compatible Messages endpoint, accepts MP3 audio in
multimodal chat, and hardens Gemma 4 input validation.
Added
- **Anthropic Messages endpoint** (`POST /v1/messages`) — translates
Anthropic-style `system`, `messages`, `max_tokens`, `temperature`,
`top_p`, `top_k`, and `stop_sequences` into the internal OpenAI chat
pipeline. Content blocks, `model` validation, and usage tracking are
supported. Streaming, tool use, and extended thinking are rejected with
clear error messages. Works with native MLX, MLX-LM delegated, and
llama.cpp delegated backends.
- **MP3 audio in multimodal chat** — `/v1/chat/completions` now accepts MP3
inline audio in addition to WAV. The container is sniffed from magic bytes
(RIFF → hound WAV decoder, ID3/MPEG sync → symphonia MP3 decoder). MP3
decoding stops at the model's `audio_seq_length` frame cap to cap memory
use. AAC/OGG/FLAC remain unsupported; send pre-computed tensors via
`/v1/generate` for those. The Python SDK preprocessing helper stays WAV-only.
Changed
- **Gemma 4 video timestamp validation** — `video_timestamp_token_ids` now
validates that every entry is a non-negative integer (rejects booleans,
floats, and negative values) with per-element error messages identifying
the exact video, frame, and index.
- **Serving contract documentation** — `docs/SERVER.md` updated to reflect
WAV/MP3 audio support, magic-byte sniffing, decode cap, and fixed
soft-token budgets.