Sglang

Latest version: v0.3.6

Safety actively analyzes 682416 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.2.0

Highlights
- We performed extensive engineering to improve the base performance. Compared to TensorRT-LLM and vLLM, SGLang now consistently delivers superior or competitive performance in both online and offline scenarios, handling models from Llama-8B to Llama-405B, on A100 and H100 GPUs, using FP8 and FP16. See the latest [blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/).
- New models: Llama3 405B, Deepseek MoE, InternLM, GPTBigCode, Mistral-Nemo

What's Changed
* Optimize mem indices mangement by hnyls2002 in https://github.com/sgl-project/sglang/pull/619
* Unify index operations by hnyls2002 in https://github.com/sgl-project/sglang/pull/620
* Simplify mem state by wisclmy0611 in https://github.com/sgl-project/sglang/pull/623
* Improve tensor parallel performance by Ying1123 in https://github.com/sgl-project/sglang/pull/625
* Bump version to 0.1.21 by Ying1123 in https://github.com/sgl-project/sglang/pull/626
* Fix model forward grad by hnyls2002 in https://github.com/sgl-project/sglang/pull/628
* Update docker file by Ying1123 in https://github.com/sgl-project/sglang/pull/629
* Disable NCCL_NVLS by default by Ying1123 in https://github.com/sgl-project/sglang/pull/631
* Add qwen2 tie word embedding by yileld in https://github.com/sgl-project/sglang/pull/630
* Add support for VertexAI safety settings by AidanCooper in https://github.com/sgl-project/sglang/pull/624
* Fix vertexai by hnyls2002 in https://github.com/sgl-project/sglang/pull/633
* Reduce docker size by hnyls2002 in https://github.com/sgl-project/sglang/pull/632
* clean up step function by Ying1123 in https://github.com/sgl-project/sglang/pull/635
* feat: support internlm2 by zhyncs in https://github.com/sgl-project/sglang/pull/636
* misc: add pre-commit config by zhyncs in https://github.com/sgl-project/sglang/pull/637
* misc: add issue and pr template by zhyncs in https://github.com/sgl-project/sglang/pull/638
* Flashinfer sample kernel by hnyls2002 in https://github.com/sgl-project/sglang/pull/617
* Move `global_server_args_dict` by hnyls2002 in https://github.com/sgl-project/sglang/pull/642
* Increase the capacity of the memory pool by Ying1123 in https://github.com/sgl-project/sglang/pull/643
* feat: add check_env by zhyncs in https://github.com/sgl-project/sglang/pull/645
* Remove the dependency of rpyc by wisclmy0611 in https://github.com/sgl-project/sglang/pull/646
* misc: rm rpyc from PACKAGE_LIST by zhyncs in https://github.com/sgl-project/sglang/pull/649
* fix: set ulimit -n 65535 by zhyncs in https://github.com/sgl-project/sglang/pull/647
* feat: add lint workflow by zhyncs in https://github.com/sgl-project/sglang/pull/648
* fix: resolve lint error by zhyncs in https://github.com/sgl-project/sglang/pull/650
* Remove useless variables in infer_batch.py by Ying1123 in https://github.com/sgl-project/sglang/pull/651
* Detokenize incrementally when streaming by hnyls2002 in https://github.com/sgl-project/sglang/pull/653
* `TokenizerManager.context_len` should inherit from `server_args.conte… by shrirajh in https://github.com/sgl-project/sglang/pull/654
* Remove cached triton launcher by merrymercy in https://github.com/sgl-project/sglang/pull/656
* perf: reduce ttft and itl with stream_interval 1 by zhyncs in https://github.com/sgl-project/sglang/pull/658
* feat: add benchmark serving by zhyncs in https://github.com/sgl-project/sglang/pull/657
* refactor model loader [unreachable code]: initial refactor by Ying1123 in https://github.com/sgl-project/sglang/pull/655
* misc: update SGLang package description by zhyncs in https://github.com/sgl-project/sglang/pull/659
* Update Readme by Ying1123 in https://github.com/sgl-project/sglang/pull/660
* feat: update check env by zhyncs in https://github.com/sgl-project/sglang/pull/661
* Improve docs by Ying1123 in https://github.com/sgl-project/sglang/pull/662
* Add benchmark instructions by Ying1123 in https://github.com/sgl-project/sglang/pull/663
* Fix jump forward when streaming by hnyls2002 in https://github.com/sgl-project/sglang/pull/665
* Fix kill process util by ispobock in https://github.com/sgl-project/sglang/pull/666
* Add support for OpenAI API parallel sampling by yichuan520030910320 in https://github.com/sgl-project/sglang/pull/640
* Update OpenAI API by wisclmy0611 in https://github.com/sgl-project/sglang/pull/667
* Temporary fix invalid sample results by hnyls2002 in https://github.com/sgl-project/sglang/pull/668
* Support random dataset in bench_serving.py by merrymercy in https://github.com/sgl-project/sglang/pull/669
* Revert "Temporary fix invalid sample results" by hnyls2002 in https://github.com/sgl-project/sglang/pull/673
* refactor model loader: initial refactor by Ying1123 in https://github.com/sgl-project/sglang/pull/664
* Fix cuda graph with flashinfer by merrymercy in https://github.com/sgl-project/sglang/pull/675
* Tmp fix illegal sample by hnyls2002 in https://github.com/sgl-project/sglang/pull/676
* Update version to 0.1.22 by Ying1123 in https://github.com/sgl-project/sglang/pull/677
* Fallback when sampling failed by ispobock in https://github.com/sgl-project/sglang/pull/678
* feat: support TRT LLM benchmark and multiple benchmarks by zhyncs in https://github.com/sgl-project/sglang/pull/670
* Decouple kv by hnyls2002 in https://github.com/sgl-project/sglang/pull/679
* Support gpt-bigcode model class by hnyls2002 in https://github.com/sgl-project/sglang/pull/681
* support non-streaming benchmark by merrymercy in https://github.com/sgl-project/sglang/pull/682
* Fix StreamExecutor.fork() losing the current role start index. by max99x in https://github.com/sgl-project/sglang/pull/684
* feat: update bench serving by zhyncs in https://github.com/sgl-project/sglang/pull/685
* misc: update output file logic by zhyncs in https://github.com/sgl-project/sglang/pull/686
* Allow disabling streaming in bench by merrymercy in https://github.com/sgl-project/sglang/pull/687
* docs: update README by zhyncs in https://github.com/sgl-project/sglang/pull/688
* Support Deepseek MoE Model by hnyls2002 in https://github.com/sgl-project/sglang/pull/689
* misc: recommend to use chat model for benchmark by zhyncs in https://github.com/sgl-project/sglang/pull/690
* Support Mistral-Nemo by ispobock in https://github.com/sgl-project/sglang/pull/691
* docs: update README by zhyncs in https://github.com/sgl-project/sglang/pull/692
* fix: update bench serving by zhyncs in https://github.com/sgl-project/sglang/pull/694
* misc: update output token logic by zhyncs in https://github.com/sgl-project/sglang/pull/695
* Tune params by Ying1123 in https://github.com/sgl-project/sglang/pull/696
* Fix trt benchmark by Ying1123 in https://github.com/sgl-project/sglang/pull/697
* misc: fix typo by zhyncs in https://github.com/sgl-project/sglang/pull/698
* Fix flashinfer by Ying1123 in https://github.com/sgl-project/sglang/pull/700
* Fix hf config loading by ispobock in https://github.com/sgl-project/sglang/pull/702
* Use min new token ratio at start by hnyls2002 in https://github.com/sgl-project/sglang/pull/701
* feat: add e2e latency by zhyncs in https://github.com/sgl-project/sglang/pull/704
* Update vllm version to support llama3.1 by Ying1123 in https://github.com/sgl-project/sglang/pull/705
* bump version to 0.1.23 by Ying1123 in https://github.com/sgl-project/sglang/pull/706
* Reduce hardcoded logic of kernel usage by wisclmy0611 in https://github.com/sgl-project/sglang/pull/707
* Fix multi-node deadlock by merrymercy in https://github.com/sgl-project/sglang/pull/709
* Auto adjust new ratio by hnyls2002 in https://github.com/sgl-project/sglang/pull/708
* Fix prefill size by Ying1123 in https://github.com/sgl-project/sglang/pull/711
* docs: update README by zhyncs in https://github.com/sgl-project/sglang/pull/712
* docs: update doc by zhyncs in https://github.com/sgl-project/sglang/pull/713
* fix: llama 3.1 405b fp8 by zhyncs in https://github.com/sgl-project/sglang/pull/714
* misc: update doc by zhyncs in https://github.com/sgl-project/sglang/pull/715
* Improve benchmark scripts by Ying1123 in https://github.com/sgl-project/sglang/pull/717
* Bump version to 0.1.24 by Ying1123 in https://github.com/sgl-project/sglang/pull/718
* docs: update supported models by zhyncs in https://github.com/sgl-project/sglang/pull/719
* docs: update comment by zhyncs in https://github.com/sgl-project/sglang/pull/721
* chore: add close inactive issues workflow by zhyncs in https://github.com/sgl-project/sglang/pull/722
* misc: update bulid instruction by zhyncs in https://github.com/sgl-project/sglang/pull/724
* fix: fp8 config by Ying1123 in https://github.com/sgl-project/sglang/pull/723
* Fix dockerfile and triton cache manager by hnyls2002 in https://github.com/sgl-project/sglang/pull/720
* chore: bump v0.1.25 by zhyncs in https://github.com/sgl-project/sglang/pull/725
* fix: resolve the logo display issue on the PyPI page by zhyncs in https://github.com/sgl-project/sglang/pull/726
* misc: update bug issue template by zhyncs in https://github.com/sgl-project/sglang/pull/727
* Revert "fix: fp8 config" by Ying1123 in https://github.com/sgl-project/sglang/pull/728
* Fix bugs (fp8 checkpoints, triton cache manager) by Ying1123 in https://github.com/sgl-project/sglang/pull/729
* Bump version to 0.2.0 by Ying1123 in https://github.com/sgl-project/sglang/pull/730

New Contributors
* yileld made their first contribution in https://github.com/sgl-project/sglang/pull/630
* AidanCooper made their first contribution in https://github.com/sgl-project/sglang/pull/624
* zhyncs made their first contribution in https://github.com/sgl-project/sglang/pull/636
* shrirajh made their first contribution in https://github.com/sgl-project/sglang/pull/654
* yichuan520030910320 made their first contribution in https://github.com/sgl-project/sglang/pull/640
* max99x made their first contribution in https://github.com/sgl-project/sglang/pull/684

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.20...v0.2.0

0.1.20

Highlights
* Enable CUDA graph by default. It brings 1.5x - 2x speedup for small batch size decoding (612)
* Model support: Gemma2, minicpm, Qwen2 MoE
* Docker support (217 )
* Various latency optimizations

What's Changed
* Add docker file by Ying1123 in https://github.com/sgl-project/sglang/pull/588
* Add Gemma2 by Ying1123 in https://github.com/sgl-project/sglang/pull/592
* Format by Ying1123 in https://github.com/sgl-project/sglang/pull/593
* Fix Llava model by wisclmy0611 in https://github.com/sgl-project/sglang/pull/594
* * fix(detokenizer_manager.py): fix truncated decoded output by Titan-p in https://github.com/sgl-project/sglang/pull/586
* Add `--enable-p2p-check` option by hnyls2002 in https://github.com/sgl-project/sglang/pull/599
* Fix streaming by hnyls2002 in https://github.com/sgl-project/sglang/pull/600
* Reduce number of workspaces for flashinfer by wisclmy0611 in https://github.com/sgl-project/sglang/pull/601
* add `LogitsMetadata` by hnyls2002 in https://github.com/sgl-project/sglang/pull/604
* add minicpm support by Titan-p in https://github.com/sgl-project/sglang/pull/602
* Make sglang compat with vllm 0.5.1 by M0gician in https://github.com/sgl-project/sglang/pull/598
* Add Qwen2 MoE support by M0gician in https://github.com/sgl-project/sglang/pull/603
* Update chat template for qwen and yi-1.5. by for-just-we in https://github.com/sgl-project/sglang/pull/530
* [Feat] Expose logprob options to `sgl.gen` API by huyiwen in https://github.com/sgl-project/sglang/pull/503
* Fix bench latency by merrymercy in https://github.com/sgl-project/sglang/pull/607
* Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py by merrymercy in https://github.com/sgl-project/sglang/pull/609
* Clean up the usage of flashinfer by merrymercy in https://github.com/sgl-project/sglang/pull/610
* Cleanup attention backend: flashinfer and triton by merrymercy in https://github.com/sgl-project/sglang/pull/611
* Enable cuda graph by default by merrymercy in https://github.com/sgl-project/sglang/pull/612
* Improve benchmark scripts & fix llava by merrymercy in https://github.com/sgl-project/sglang/pull/613
* Memorypool chunked prefetch by hnyls2002 in https://github.com/sgl-project/sglang/pull/614
* Improve benchmark scripts by merrymercy in https://github.com/sgl-project/sglang/pull/615
* Fix memory pool index error by Ying1123 in https://github.com/sgl-project/sglang/pull/616
* Bump version to 0.1.20 by merrymercy in https://github.com/sgl-project/sglang/pull/618

New Contributors
* wisclmy0611 made their first contribution in https://github.com/sgl-project/sglang/pull/594
* Titan-p made their first contribution in https://github.com/sgl-project/sglang/pull/586
* M0gician made their first contribution in https://github.com/sgl-project/sglang/pull/598
* for-just-we made their first contribution in https://github.com/sgl-project/sglang/pull/530

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.18...v0.1.20

0.1.18

Highlight
- 2x large batch prefill improvement with the new flashinfer kernels 579
- Multi-node tensor parallelism 550
- New model support: ChatGLM 516


What's Changed
* Fix missing numpy dependency in pyproject.toml by fpreiss in https://github.com/sgl-project/sglang/pull/524
* Fix RAG nb, parea setup (parea -> parea-ai) by fpreiss in https://github.com/sgl-project/sglang/pull/525
* [Minor] Correct Optional type hints in api by fpreiss in https://github.com/sgl-project/sglang/pull/526
* Add ChatGLM Model Support by Qubitium in https://github.com/sgl-project/sglang/pull/516
* Fix Regression: Disable p2p for 4090 by ZX-ModelCloud in https://github.com/sgl-project/sglang/pull/531
* Decode Incrementally by hnyls2002 in https://github.com/sgl-project/sglang/pull/517
* Fix dependency by merrymercy in https://github.com/sgl-project/sglang/pull/538
* Fix dependency & crash issues by Ying1123 in https://github.com/sgl-project/sglang/pull/539
* Higher priority for user input of max_prefill_tokens & format by Ying1123 in https://github.com/sgl-project/sglang/pull/540
* Add disk cache for loading ShareGPT dataset. by hnyls2002 in https://github.com/sgl-project/sglang/pull/542
* Fix tp worker only checking req[0] for stream by Qubitium in https://github.com/sgl-project/sglang/pull/546
* Fix the Jump-Forward with Chinese by hnyls2002 in https://github.com/sgl-project/sglang/pull/551
* Update fused_moe by merrymercy in https://github.com/sgl-project/sglang/pull/553
* Multi-node Tensor Parallelism by Ying1123 in https://github.com/sgl-project/sglang/pull/550
* Update flashinfer to 0.0.5 by merrymercy in https://github.com/sgl-project/sglang/pull/554
* Follow-up fixes for flashinfer 0.0.5 by merrymercy in https://github.com/sgl-project/sglang/pull/556
* Fix latency benchmark by hnyls2002 in https://github.com/sgl-project/sglang/pull/557
* Clean up logits processor by merrymercy in https://github.com/sgl-project/sglang/pull/558
* Update test_flashinfer by hnyls2002 in https://github.com/sgl-project/sglang/pull/560
* Allow running with vllm==0.4.3 by merrymercy in https://github.com/sgl-project/sglang/pull/561
* Add a new arguments log_level_http to control the HTTP logging by merrymercy in https://github.com/sgl-project/sglang/pull/563
* Add sglang.bench_latency for offline benchmark by merrymercy in https://github.com/sgl-project/sglang/pull/564
* Warmup cublas by merrymercy in https://github.com/sgl-project/sglang/pull/566
* Increase the number of thread limitation for tp worker managers. by merrymercy in https://github.com/sgl-project/sglang/pull/567
* Update readme by merrymercy in https://github.com/sgl-project/sglang/pull/568
* Expose dtype argument by merrymercy in https://github.com/sgl-project/sglang/pull/569
* Update benchmark script by Ying1123 in https://github.com/sgl-project/sglang/pull/571
* Minor fix in compiler & format by ZackZeng999 in https://github.com/sgl-project/sglang/pull/545
* Update run_batch interface and max_prefill_tokens by Ying1123 in https://github.com/sgl-project/sglang/pull/574
* Fix flashinfer version by PanJason in https://github.com/sgl-project/sglang/pull/576
* [BugFix] gemma loading weights "lm_head.weight" key error by dhgarcia in https://github.com/sgl-project/sglang/pull/577
* Turn on flashinfer by default by Ying1123 in https://github.com/sgl-project/sglang/pull/578
* fix the broken server args by hnyls2002 in https://github.com/sgl-project/sglang/pull/585
* 2x performance improvement for large prefill & Fix workspace conflicts by Ying1123 in https://github.com/sgl-project/sglang/pull/579

New Contributors
* fpreiss made their first contribution in https://github.com/sgl-project/sglang/pull/524
* ZackZeng999 made their first contribution in https://github.com/sgl-project/sglang/pull/545
* PanJason made their first contribution in https://github.com/sgl-project/sglang/pull/576
* dhgarcia made their first contribution in https://github.com/sgl-project/sglang/pull/577

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.17...v0.1.18

0.1.17

Highlights
- Add data parallelim 480
- Add speculative execution for OpenAI API 250
- Update vllm to v0.4.3 for new quantization features 511
- Better error handling (457, 449, 514)

What's Changed
* [Feat] Add llava qwen, llava mistral by kcz358 in https://github.com/sgl-project/sglang/pull/419
* Format code by hnyls2002 in https://github.com/sgl-project/sglang/pull/441
* Add finish_reason to OpenAI API by mgerstgrasser in https://github.com/sgl-project/sglang/pull/446
* Simplify port allocation by merrymercy in https://github.com/sgl-project/sglang/pull/447
* Add PUT for generate api by Ying1123 in https://github.com/sgl-project/sglang/pull/448
* Improve error handling & abort disconnected requests by merrymercy in https://github.com/sgl-project/sglang/pull/449
* Fix the broken `--disable-radix-cache` by hnyls2002 in https://github.com/sgl-project/sglang/pull/451
* openai chat speculative execution by ChuyueSun in https://github.com/sgl-project/sglang/pull/250
* Fix openai speculative execution by Ying1123 in https://github.com/sgl-project/sglang/pull/456
* Abort disconnected requests by merrymercy in https://github.com/sgl-project/sglang/pull/457
* Rename api_num_spec_tokens -> num_api_spec_tokens by merrymercy in https://github.com/sgl-project/sglang/pull/458
* Use model loader from vllm by merrymercy in https://github.com/sgl-project/sglang/pull/459
* port fp8 mixtral by merrymercy in https://github.com/sgl-project/sglang/pull/460
* fix test bug in srt_llava_next_test.py by bingwork in https://github.com/sgl-project/sglang/pull/470
* Add the instruction link to the LLaVA-NeXT-Video at README by ZhangYuanhan-AI in https://github.com/sgl-project/sglang/pull/463
* Improve logging & add logit cap by merrymercy in https://github.com/sgl-project/sglang/pull/471
* Optimize retract by hnyls2002 in https://github.com/sgl-project/sglang/pull/440
* Add benchmark scripts by Ying1123 in https://github.com/sgl-project/sglang/pull/476
* [Feat/Fix] Refactoring Llava models into single file by Luodian in https://github.com/sgl-project/sglang/pull/475
* Improve benchmark scripts & rename some scripts by merrymercy in https://github.com/sgl-project/sglang/pull/477
* Improve benchmark scripts & add more models by merrymercy in https://github.com/sgl-project/sglang/pull/484
* Support data parallelism (static) by Ying1123 in https://github.com/sgl-project/sglang/pull/480
* Make the server random by default by merrymercy in https://github.com/sgl-project/sglang/pull/488
* Revert "Make the server random by default" by Ying1123 in https://github.com/sgl-project/sglang/pull/492
* update the script: examples/usage/llava_video/srt_example_llava_v.sh by ZhangYuanhan-AI in https://github.com/sgl-project/sglang/pull/491
* Make the server random by default by merrymercy in https://github.com/sgl-project/sglang/pull/493
* Update vllm to v0.4.3 by merrymercy in https://github.com/sgl-project/sglang/pull/511
* remove redundant pad_input_ids function by amosyou in https://github.com/sgl-project/sglang/pull/500
* Litellm Backend by huyiwen in https://github.com/sgl-project/sglang/pull/502
* Fix rid state map leak + Refractor .finished by Qubitium in https://github.com/sgl-project/sglang/pull/505
* Crash the server when error or OOM happens by merrymercy in https://github.com/sgl-project/sglang/pull/514
* Update version to 0.1.17 by merrymercy in https://github.com/sgl-project/sglang/pull/515

New Contributors
* kcz358 made their first contribution in https://github.com/sgl-project/sglang/pull/419
* mgerstgrasser made their first contribution in https://github.com/sgl-project/sglang/pull/446
* bingwork made their first contribution in https://github.com/sgl-project/sglang/pull/470
* amosyou made their first contribution in https://github.com/sgl-project/sglang/pull/500
* huyiwen made their first contribution in https://github.com/sgl-project/sglang/pull/502

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.16...v0.1.17

0.1.16

Highlight
* Support more models: DBRX, Command-R, Gemma
* Support llava-video (423, https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)
* Cache performance improvements (418, 364)
* Marlin quantization kernels
* Many bug fixes
* Update dependencies to be compatible with their latest versions

What's Changed
* Fix Runtime missing some ServerArgs options by Qubitium in https://github.com/sgl-project/sglang/pull/281
* adding the triton docker build minimal example by amirarsalan90 in https://github.com/sgl-project/sglang/pull/242
* Fix flashinfer >= 0.0.3 compat by Qubitium in https://github.com/sgl-project/sglang/pull/282
* Fix Incorrect CURL Request Example in README by amirarsalan90 in https://github.com/sgl-project/sglang/pull/287
* enable marlin kernels by qeternity in https://github.com/sgl-project/sglang/pull/286
* Fix env (docker) compat due to __file__ usage by Qubitium in https://github.com/sgl-project/sglang/pull/288
* Fix marlin model loading compat with autogptq by Liurl21 in https://github.com/sgl-project/sglang/pull/290
* Fix outlines-0.0.35 incompatibility by ZhouGongZaiShi in https://github.com/sgl-project/sglang/pull/291
* [Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models by Luodian in https://github.com/sgl-project/sglang/pull/311
* Use Anthropic messages API by janimo in https://github.com/sgl-project/sglang/pull/304
* Add StableLM model. by janimo in https://github.com/sgl-project/sglang/pull/301
* Support oai in benchmark/mmlu by merrymercy in https://github.com/sgl-project/sglang/pull/323
* Update version to v0.1.14 by merrymercy in https://github.com/sgl-project/sglang/pull/324
* Cleanup codebase: removed unnecessary code/logic by Qubitium in https://github.com/sgl-project/sglang/pull/298
* Update dependencies by janimo in https://github.com/sgl-project/sglang/pull/326
* Openrouter usage example by janimo in https://github.com/sgl-project/sglang/pull/327
* `model_rpc` style improvement by hnyls2002 in https://github.com/sgl-project/sglang/pull/293
* `model_runner` simplify by hnyls2002 in https://github.com/sgl-project/sglang/pull/329
* Logprobs Refractor by hnyls2002 in https://github.com/sgl-project/sglang/pull/331
* `DBRX` support by hnyls2002 in https://github.com/sgl-project/sglang/pull/337
* Add support for new autogptq quant_config.checkpoint_format by Qubitium in https://github.com/sgl-project/sglang/pull/332
* Fix llava parallelism/fork bug by lockon-n in https://github.com/sgl-project/sglang/pull/315
* Eliminate 2 gpu ops during sampling when logit_bias is zero by hnyls2002 in https://github.com/sgl-project/sglang/pull/343
* Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" by hnyls2002 in https://github.com/sgl-project/sglang/pull/345
* Eliminate 2 gpu ops during sampling when logit_bias is zero by Qubitium in https://github.com/sgl-project/sglang/pull/338
* Add timeout to get_meta_info by SimoneRaponi in https://github.com/sgl-project/sglang/pull/346
* Fix typos in infer_batch.py by tom-doerr in https://github.com/sgl-project/sglang/pull/354
* Time cost utils by hnyls2002 in https://github.com/sgl-project/sglang/pull/355
* Update README.md by eltociear in https://github.com/sgl-project/sglang/pull/358
* support `command-r` by ZhouXingg in https://github.com/sgl-project/sglang/pull/369
* Fix issue 367 – System message not supported for Anthropic (anthropic.BadRequestError) by fronx in https://github.com/sgl-project/sglang/pull/368
* Update model support in readme by Ying1123 in https://github.com/sgl-project/sglang/pull/370
* Optimize radix tree matching by ispobock in https://github.com/sgl-project/sglang/pull/364
* Reduce overhead when `fork(1)` by hnyls2002 in https://github.com/sgl-project/sglang/pull/375
* llama3 instruct template by qeternity in https://github.com/sgl-project/sglang/pull/372
* add `.isort.cfg` by hnyls2002 in https://github.com/sgl-project/sglang/pull/378
* Revert removing the unused imports by hnyls2002 in https://github.com/sgl-project/sglang/pull/385
* Benchmark Updates by hnyls2002 in https://github.com/sgl-project/sglang/pull/382
* Improve performance when running with full parallel by hnyls2002 in https://github.com/sgl-project/sglang/pull/394
* Minor: style improvement of radix_cache and memory_pool by hnyls2002 in https://github.com/sgl-project/sglang/pull/395
* Format Benchmark Code by hnyls2002 in https://github.com/sgl-project/sglang/pull/399
* Fix chatml template by merrymercy in https://github.com/sgl-project/sglang/pull/406
* Adding RAG tracing & eval cookbook using Parea by joschkabraun in https://github.com/sgl-project/sglang/pull/390
* SamplingParams add "spaces_between_special_tokens" argument by ZhouXingg in https://github.com/sgl-project/sglang/pull/392
* Organize Benchmark by hnyls2002 in https://github.com/sgl-project/sglang/pull/381
* Add Cohere Command R chat template by noah-kim-theori in https://github.com/sgl-project/sglang/pull/411
* Fix `sync()` when `fork(1)` by hnyls2002 in https://github.com/sgl-project/sglang/pull/412
* Include finish reason in meta info response by qeternity in https://github.com/sgl-project/sglang/pull/415
* Make public APIs more standard. by hnyls2002 in https://github.com/sgl-project/sglang/pull/416
* Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 by Qubitium in https://github.com/sgl-project/sglang/pull/380
* Optimize the memory usage of logits processor by merrymercy in https://github.com/sgl-project/sglang/pull/420
* Clean up by merrymercy in https://github.com/sgl-project/sglang/pull/422
* Fix logit processor bugs by merrymercy in https://github.com/sgl-project/sglang/pull/427
* Minor fix for the import path by merrymercy in https://github.com/sgl-project/sglang/pull/428
* Move openai api server into a separate file by merrymercy in https://github.com/sgl-project/sglang/pull/429
* Fix flashinfer by merrymercy in https://github.com/sgl-project/sglang/pull/430
* Update version to 0.1.15 by merrymercy in https://github.com/sgl-project/sglang/pull/431
* Misc fixes by merrymercy in https://github.com/sgl-project/sglang/pull/432
* Allow `input_ids` in the input of the `/generate` endpoint by lolipopshock in https://github.com/sgl-project/sglang/pull/363
* Improve error handling by merrymercy in https://github.com/sgl-project/sglang/pull/433
* Cache optimizations by hnyls2002 in https://github.com/sgl-project/sglang/pull/418
* Update readme by merrymercy in https://github.com/sgl-project/sglang/pull/434
* Raise errors for prompts that are too long by merrymercy in https://github.com/sgl-project/sglang/pull/436
* support llava video by ZhangYuanhan-AI in https://github.com/sgl-project/sglang/pull/426
* Fix streaming by merrymercy in https://github.com/sgl-project/sglang/pull/437
* Update version to 0.1.16 by merrymercy in https://github.com/sgl-project/sglang/pull/438

New Contributors
* Qubitium made their first contribution in https://github.com/sgl-project/sglang/pull/281
* amirarsalan90 made their first contribution in https://github.com/sgl-project/sglang/pull/242
* Liurl21 made their first contribution in https://github.com/sgl-project/sglang/pull/290
* ZhouGongZaiShi made their first contribution in https://github.com/sgl-project/sglang/pull/291
* Luodian made their first contribution in https://github.com/sgl-project/sglang/pull/311
* janimo made their first contribution in https://github.com/sgl-project/sglang/pull/304
* lockon-n made their first contribution in https://github.com/sgl-project/sglang/pull/315
* SimoneRaponi made their first contribution in https://github.com/sgl-project/sglang/pull/346
* tom-doerr made their first contribution in https://github.com/sgl-project/sglang/pull/354
* ZhouXingg made their first contribution in https://github.com/sgl-project/sglang/pull/369
* fronx made their first contribution in https://github.com/sgl-project/sglang/pull/368
* ispobock made their first contribution in https://github.com/sgl-project/sglang/pull/364
* joschkabraun made their first contribution in https://github.com/sgl-project/sglang/pull/390
* noah-kim-theori made their first contribution in https://github.com/sgl-project/sglang/pull/411
* lolipopshock made their first contribution in https://github.com/sgl-project/sglang/pull/363
* ZhangYuanhan-AI made their first contribution in https://github.com/sgl-project/sglang/pull/426

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.13...v0.1.16

0.1.13

Highlights
* Gemma Support by hnyls2002 in https://github.com/sgl-project/sglang/pull/256
* Add Together and AzureOpenAI examples by merrymercy in https://github.com/sgl-project/sglang/pull/184

What's Changed
* correct a mistake on the README.md by yaya-sy in https://github.com/sgl-project/sglang/pull/182
* correct reference dtype openai.py by yaya-sy in https://github.com/sgl-project/sglang/pull/181
* Add Together and AzureOpenAI examples by merrymercy in https://github.com/sgl-project/sglang/pull/184
* Fix server launch for jupyter notebook by merrymercy in https://github.com/sgl-project/sglang/pull/186
* Refactor decoding logprob and add completion_tokens_wo_jump_forward by comaniac in https://github.com/sgl-project/sglang/pull/189
* Pin outlines version by comaniac in https://github.com/sgl-project/sglang/pull/196
* Adjust outlines version. by hnyls2002 in https://github.com/sgl-project/sglang/pull/200
* Update README.md by eltociear in https://github.com/sgl-project/sglang/pull/207
* Added the ability to Modify the Context Length by psych0v0yager in https://github.com/sgl-project/sglang/pull/210
* Fix logprobs with logprob_start_len by comaniac in https://github.com/sgl-project/sglang/pull/193
* Support outlines > 0.0.31 by comaniac in https://github.com/sgl-project/sglang/pull/219
* Fix stop str merging by hnyls2002 in https://github.com/sgl-project/sglang/pull/225
* Fix interpreter.py `get_var(var_name)` in text iter when `stream` is not enabled by exceedzhang in https://github.com/sgl-project/sglang/pull/198
* fix chatml template by qeternity in https://github.com/sgl-project/sglang/pull/195
* Upload `agent_calls.jsonl` download link by hnyls2002 in https://github.com/sgl-project/sglang/pull/226
* Fix addr reuse in check_port by hnyls2002 in https://github.com/sgl-project/sglang/pull/253
* Add SSL Cert Functionality by nivibilla in https://github.com/sgl-project/sglang/pull/224
* Refactor ChatTemplate for Enhanced Clarity and Efficiency by cubxxw in https://github.com/sgl-project/sglang/pull/201
* Add `set_var` to interpreter.py by 1024th in https://github.com/sgl-project/sglang/pull/263
* Add logo by merrymercy in https://github.com/sgl-project/sglang/pull/275
* Fix qwen config by hnyls2002 in https://github.com/sgl-project/sglang/pull/261
* replace skip_embed with input_embeds by TideDra in https://github.com/sgl-project/sglang/pull/222
* Gemma Support by hnyls2002 in https://github.com/sgl-project/sglang/pull/256
* Improve gemma and documentations by merrymercy in https://github.com/sgl-project/sglang/pull/278
* Organize `server_args` by hnyls2002 in https://github.com/sgl-project/sglang/pull/277
* Add Support for API Key Authentication by alessiodallapiazza in https://github.com/sgl-project/sglang/pull/230
* Fix RuntimeEndpoint by merrymercy in https://github.com/sgl-project/sglang/pull/279
* Update version to v0.1.13 by merrymercy in https://github.com/sgl-project/sglang/pull/280

New Contributors
* psych0v0yager made their first contribution in https://github.com/sgl-project/sglang/pull/210
* exceedzhang made their first contribution in https://github.com/sgl-project/sglang/pull/198
* qeternity made their first contribution in https://github.com/sgl-project/sglang/pull/195
* cubxxw made their first contribution in https://github.com/sgl-project/sglang/pull/201
* 1024th made their first contribution in https://github.com/sgl-project/sglang/pull/263
* TideDra made their first contribution in https://github.com/sgl-project/sglang/pull/222
* alessiodallapiazza made their first contribution in https://github.com/sgl-project/sglang/pull/230

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.1.12...v0.1.13

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.