- TP: fallback SDPA mode when flash-attn is unavailable - Faster filter/grammar path - Add DRY - Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby - Banish Râul
- Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models - CUDA Graphs to reduce overhead and CPU bottlenecking - Various other optimizations - Some bugfixes