Highlight
* Support more models: DBRX, Command-R, Gemma
* Support llava-video (423, https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)
* Cache performance improvements (418, 364)
* Marlin quantization kernels
* Many bug fixes
* Update dependencies to be compatible with their latest versions
What's Changed
* Fix Runtime missing some ServerArgs options by Qubitium in https://github.com/sgl-project/sglang/pull/281
* adding the triton docker build minimal example by amirarsalan90 in https://github.com/sgl-project/sglang/pull/242
* Fix flashinfer >= 0.0.3 compat by Qubitium in https://github.com/sgl-project/sglang/pull/282
* Fix Incorrect CURL Request Example in README by amirarsalan90 in https://github.com/sgl-project/sglang/pull/287
* enable marlin kernels by qeternity in https://github.com/sgl-project/sglang/pull/286
* Fix env (docker) compat due to __file__ usage by Qubitium in https://github.com/sgl-project/sglang/pull/288
* Fix marlin model loading compat with autogptq by Liurl21 in https://github.com/sgl-project/sglang/pull/290
* Fix outlines-0.0.35 incompatibility by ZhouGongZaiShi in https://github.com/sgl-project/sglang/pull/291
* [Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models by Luodian in https://github.com/sgl-project/sglang/pull/311
* Use Anthropic messages API by janimo in https://github.com/sgl-project/sglang/pull/304
* Add StableLM model. by janimo in https://github.com/sgl-project/sglang/pull/301
* Support oai in benchmark/mmlu by merrymercy in https://github.com/sgl-project/sglang/pull/323
* Update version to v0.1.14 by merrymercy in https://github.com/sgl-project/sglang/pull/324
* Cleanup codebase: removed unnecessary code/logic by Qubitium in https://github.com/sgl-project/sglang/pull/298
* Update dependencies by janimo in https://github.com/sgl-project/sglang/pull/326
* Openrouter usage example by janimo in https://github.com/sgl-project/sglang/pull/327
* `model_rpc` style improvement by hnyls2002 in https://github.com/sgl-project/sglang/pull/293
* `model_runner` simplify by hnyls2002 in https://github.com/sgl-project/sglang/pull/329
* Logprobs Refractor by hnyls2002 in https://github.com/sgl-project/sglang/pull/331
* `DBRX` support by hnyls2002 in https://github.com/sgl-project/sglang/pull/337
* Add support for new autogptq quant_config.checkpoint_format by Qubitium in https://github.com/sgl-project/sglang/pull/332
* Fix llava parallelism/fork bug by lockon-n in https://github.com/sgl-project/sglang/pull/315
* Eliminate 2 gpu ops during sampling when logit_bias is zero by hnyls2002 in https://github.com/sgl-project/sglang/pull/343
* Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" by hnyls2002 in https://github.com/sgl-project/sglang/pull/345
* Eliminate 2 gpu ops during sampling when logit_bias is zero by Qubitium in https://github.com/sgl-project/sglang/pull/338
* Add timeout to get_meta_info by SimoneRaponi in https://github.com/sgl-project/sglang/pull/346
* Fix typos in infer_batch.py by tom-doerr in https://github.com/sgl-project/sglang/pull/354
* Time cost utils by hnyls2002 in https://github.com/sgl-project/sglang/pull/355
* Update README.md by eltociear in https://github.com/sgl-project/sglang/pull/358
* support `command-r` by ZhouXingg in https://github.com/sgl-project/sglang/pull/369
* Fix issue 367 – System message not supported for Anthropic (anthropic.BadRequestError) by fronx in https://github.com/sgl-project/sglang/pull/368
* Update model support in readme by Ying1123 in https://github.com/sgl-project/sglang/pull/370
* Optimize radix tree matching by ispobock in https://github.com/sgl-project/sglang/pull/364
* Reduce overhead when `fork(1)` by hnyls2002 in https://github.com/sgl-project/sglang/pull/375
* llama3 instruct template by qeternity in https://github.com/sgl-project/sglang/pull/372
* add `.isort.cfg` by hnyls2002 in https://github.com/sgl-project/sglang/pull/378
* Revert removing the unused imports by hnyls2002 in https://github.com/sgl-project/sglang/pull/385
* Benchmark Updates by hnyls2002 in https://github.com/sgl-project/sglang/pull/382
* Improve performance when running with full parallel by hnyls2002 in https://github.com/sgl-project/sglang/pull/394
* Minor: style improvement of radix_cache and memory_pool by hnyls2002 in https://github.com/sgl-project/sglang/pull/395
* Format Benchmark Code by hnyls2002 in https://github.com/sgl-project/sglang/pull/399
* Fix chatml template by merrymercy in https://github.com/sgl-project/sglang/pull/406
* Adding RAG tracing & eval cookbook using Parea by joschkabraun in https://github.com/sgl-project/sglang/pull/390
* SamplingParams add "spaces_between_special_tokens" argument by ZhouXingg in https://github.com/sgl-project/sglang/pull/392
* Organize Benchmark by hnyls2002 in https://github.com/sgl-project/sglang/pull/381
* Add Cohere Command R chat template by noah-kim-theori in https://github.com/sgl-project/sglang/pull/411
* Fix `sync()` when `fork(1)` by hnyls2002 in https://github.com/sgl-project/sglang/pull/412
* Include finish reason in meta info response by qeternity in https://github.com/sgl-project/sglang/pull/415
* Make public APIs more standard. by hnyls2002 in https://github.com/sgl-project/sglang/pull/416
* Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 by Qubitium in https://github.com/sgl-project/sglang/pull/380
* Optimize the memory usage of logits processor by merrymercy in https://github.com/sgl-project/sglang/pull/420
* Clean up by merrymercy in https://github.com/sgl-project/sglang/pull/422
* Fix logit processor bugs by merrymercy in https://github.com/sgl-project/sglang/pull/427
* Minor fix for the import path by merrymercy in https://github.com/sgl-project/sglang/pull/428
* Move openai api server into a separate file by merrymercy in https://github.com/sgl-project/sglang/pull/429
* Fix flashinfer by merrymercy in https://github.com/sgl-project/sglang/pull/430
* Update version to 0.1.15 by merrymercy in https://github.com/sgl-project/sglang/pull/431
* Misc fixes by merrymercy in https://github.com/sgl-project/sglang/pull/432
* Allow `input_ids` in the input of the `/generate` endpoint by lolipopshock in https://github.com/sgl-project/sglang/pull/363
* Improve error handling by merrymercy in https://github.com/sgl-project/sglang/pull/433
* Cache optimizations by hnyls2002 in https://github.com/sgl-project/sglang/pull/418
* Update readme by merrymercy in https://github.com/sgl-project/sglang/pull/434
* Raise errors for prompts that are too long by merrymercy in https://github.com/sgl-project/sglang/pull/436
* support llava video by ZhangYuanhan-AI in https://github.com/sgl-project/sglang/pull/426
* Fix streaming by merrymercy in https://github.com/sgl-project/sglang/pull/437
* Update version to 0.1.16 by merrymercy in https://github.com/sgl-project/sglang/pull/438
New Contributors
* Qubitium made their first contribution in https://github.com/sgl-project/sglang/pull/281
* amirarsalan90 made their first contribution in https://github.com/sgl-project/sglang/pull/242
* Liurl21 made their first contribution in https://github.com/sgl-project/sglang/pull/290
* ZhouGongZaiShi made their first contribution in https://github.com/sgl-project/sglang/pull/291
* Luodian made their first contribution in https://github.com/sgl-project/sglang/pull/311
* janimo made their first contribution in https://github.com/sgl-project/sglang/pull/304
* lockon-n made their first contribution in https://github.com/sgl-project/sglang/pull/315
* SimoneRaponi made their first contribution in https://github.com/sgl-project/sglang/pull/346
* tom-doerr made their first contribution in https://github.com/sgl-project/sglang/pull/354
* ZhouXingg made their first contribution in https://github.com/sgl-project/sglang/pull/369
* fronx made their first contribution in https://github.com/sgl-project/sglang/pull/368
* ispobock made their first contribution in https://github.com/sgl-project/sglang/pull/364
* joschkabraun made their first contribution in https://github.com/sgl-project/sglang/pull/390
* noah-kim-theori made their first contribution in https://github.com/sgl-project/sglang/pull/411
* lolipopshock made their first contribution in https://github.com/sgl-project/sglang/pull/363
* ZhangYuanhan-AI made their first contribution in https://github.com/sgl-project/sglang/pull/426
**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.13...v0.1.16