- feat: Update llama.cpp to ggerganov/llama.cpp57e2a7a52a819883f40dada8a2edc24ecf48186b
- feat(server): Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files by abetlen in b8fc1c7d83ad4a9207c707ba1d954fe580286a01
- feat: Integration of Jinja2 Templating for chat formats by teleprint-me in 875
- fix: Offload KQV by default by abetlen in 48c3b77e6f558a9899de0e1155c7dc0c7958d8e8
- fix: Support Accept text/event-stream in chat and completion endpoints, resolves 1083 by aniljava in 1088
- fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per 1015 by K-Mistele in 1093