Sglang

Latest version: v0.4.4.post3

Safety actively analyzes 724206 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 7

0.3.2

New Contributors
* zifeitong made their first contribution in https://github.com/sgl-project/sglang/pull/1363
* wcsjtu made their first contribution in https://github.com/sgl-project/sglang/pull/1370
* Achazwl made their first contribution in https://github.com/sgl-project/sglang/pull/1371
* josephrocca made their first contribution in https://github.com/sgl-project/sglang/pull/1373
* blacker521 made their first contribution in https://github.com/sgl-project/sglang/pull/1367
* yzh119 made their first contribution in https://github.com/sgl-project/sglang/pull/1403
* hxer7963 made their first contribution in https://github.com/sgl-project/sglang/pull/1397
* Aphoh made their first contribution in https://github.com/sgl-project/sglang/pull/1427
* HaiShaw made their first contribution in https://github.com/sgl-project/sglang/pull/1420
* jasonyux made their first contribution in https://github.com/sgl-project/sglang/pull/1449
* Muennighoff made their first contribution in https://github.com/sgl-project/sglang/pull/1476
* rchen19 made their first contribution in https://github.com/sgl-project/sglang/pull/1481
* wellhowtosay made their first contribution in https://github.com/sgl-project/sglang/pull/1456
* luzengxiangcn made their first contribution in https://github.com/sgl-project/sglang/pull/1499
* TianyiQ made their first contribution in https://github.com/sgl-project/sglang/pull/1508

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.3.0...v0.3.2

0.3.1.post2

* Fix env vars in bench_latency by merrymercy in https://github.com/sgl-project/sglang/pull/1472
* feat: update linear deps 1/N by zhyncs in https://github.com/sgl-project/sglang/pull/1305
* minor: add quant eval compared with base by zhyncs in https://github.com/sgl-project/sglang/pull/1475
* Add OLMoE by Muennighoff in https://github.com/sgl-project/sglang/pull/1476
* Fix triton head num by ispobock in https://github.com/sgl-project/sglang/pull/1482
* Add MLA gsm8k eval by ispobock in https://github.com/sgl-project/sglang/pull/1484
* chore: bump v0.3.1.post3 by zhyncs in https://github.com/sgl-project/sglang/pull/1483
* fix incorrect links in documentation by rchen19 in https://github.com/sgl-project/sglang/pull/1481
* doc: update backend by zhyncs in https://github.com/sgl-project/sglang/pull/1486
* Better unit tests for adding a new model by merrymercy in https://github.com/sgl-project/sglang/pull/1488
* Pr fix max workers by wellhowtosay in https://github.com/sgl-project/sglang/pull/1456
* Add a unit test for data parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/1489
* Add AMD tests to CI by Ying1123 in https://github.com/sgl-project/sglang/pull/1491
* Update dockerfile to include datamodel_code_generator by merrymercy in https://github.com/sgl-project/sglang/pull/1492
* [API, Feature] Support response prefill for openai API by Ying1123 in https://github.com/sgl-project/sglang/pull/1490
* minor: add mla fp8 test by zhyncs in https://github.com/sgl-project/sglang/pull/1494
* Fix the overhead due to penalizer in bench_latency by merrymercy in https://github.com/sgl-project/sglang/pull/1496
* MoE torch compile by ispobock in https://github.com/sgl-project/sglang/pull/1497
* [CI] Move AMD test to a separate file by merrymercy in https://github.com/sgl-project/sglang/pull/1500
* Update test_srt_backend.py by merrymercy in https://github.com/sgl-project/sglang/pull/1502
* debug radixcache stack_overflow by luzengxiangcn in https://github.com/sgl-project/sglang/pull/1499
* Simplify bench_latency.py by merrymercy in https://github.com/sgl-project/sglang/pull/1503
* [Fix] Fix clean_up_tokenization_spaces in tokenizer by merrymercy in https://github.com/sgl-project/sglang/pull/1510
* Add support for tie_word_embeddings when loading weights + support for SmolLM by TianyiQ in https://github.com/sgl-project/sglang/pull/1508
* Revert "kernel: use tensor cores for flashinfer gqa kernels" by Ying1123 in https://github.com/sgl-project/sglang/pull/1511

0.3.1.post1

* Enable MLA by default by ispobock in https://github.com/sgl-project/sglang/pull/1447
* Fix attention backend by ispobock in https://github.com/sgl-project/sglang/pull/1448
* fix schedule bug by hnyls2002 in https://github.com/sgl-project/sglang/pull/1450
* Fix schedule bug by hnyls2002 in https://github.com/sgl-project/sglang/pull/1451
* Fixed n>1 causing list index out of range with VLM by jasonyux in https://github.com/sgl-project/sglang/pull/1449
* Add bench_server_latency.py by merrymercy in https://github.com/sgl-project/sglang/pull/1452
* [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (1419) by HaiShaw in https://github.com/sgl-project/sglang/pull/1453
* Fix oom issues with fp8 for llama by merrymercy in https://github.com/sgl-project/sglang/pull/1454
* Fuse top_k and top_k in the sampler by merrymercy in https://github.com/sgl-project/sglang/pull/1457
* [Event] Add public meeting invite to README by Ying1123 in https://github.com/sgl-project/sglang/pull/1458
* fix: creat new dict everytime for putting new frame by Luodian in https://github.com/sgl-project/sglang/pull/1464
* Fix padding in the cuda graph by merrymercy in https://github.com/sgl-project/sglang/pull/1469

0.3.1

* Remove deprecated configs by merrymercy in https://github.com/sgl-project/sglang/pull/1431
* [Feature] Support LoRA path renaming and add LoRA serving benchmarks by Ying1123 in https://github.com/sgl-project/sglang/pull/1433
* Revert "[Minor] Raise exception for wrong import (1409)" by Ying1123 in https://github.com/sgl-project/sglang/pull/1432
* Add constrained_json_whitespace_pattern to ServerArgs by zifeitong in https://github.com/sgl-project/sglang/pull/1438
* Clean up model loader by merrymercy in https://github.com/sgl-project/sglang/pull/1440
* Simplify sampler and its error handling by merrymercy in https://github.com/sgl-project/sglang/pull/1441
* [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm by HaiShaw in https://github.com/sgl-project/sglang/pull/1420
* Fix torch compile for deepseek-v2 by ispobock in https://github.com/sgl-project/sglang/pull/1442
* Add OLMoE model by janimo in https://github.com/sgl-project/sglang/pull/1444

0.3.0

Highlights
Checkout the release blog post https://lmsys.org/blog/2024-09-04-sglang-v0-3/ to find detailed instructions and descriptions for the items below.
- Up to 7x higher throughput for DeepSeek Multi-head Latent Attention (MLA)
- Up to 1.5x lower latency with torch.compile on small batch sizes
- Support for interleaved text and multi-image/video in LLaVA-OneVision
- Support for interleaved window attention and 2x longer context length in Gemma-2
- Chunked prefill is turned on by default (You can choose separate or mix prefill and decode).
- Add multi-GPU accuracy, performance test, and nightly accuracy test for more models.

What's Changed
* update hyperparameter guide by merrymercy in https://github.com/sgl-project/sglang/pull/1114
* ci: compatible with fork repo by zhyncs in https://github.com/sgl-project/sglang/pull/1115
* fix: resolve Python.h header missing by zhyncs in https://github.com/sgl-project/sglang/pull/1119
* Fix the deadlock in multi-node tp by merrymercy in https://github.com/sgl-project/sglang/pull/1122
* Mixed style of chunked prefill by hnyls2002 in https://github.com/sgl-project/sglang/pull/1013
* Fix port conflicts between local CI and runner CI. by hnyls2002 in https://github.com/sgl-project/sglang/pull/1131
* Fix CI accuracy && time out limit by hnyls2002 in https://github.com/sgl-project/sglang/pull/1133
* fix: use fp16 dtype for sm75 by zhyncs in https://github.com/sgl-project/sglang/pull/1136
* Improve the code style: more comments and remove useless packages by merrymercy in https://github.com/sgl-project/sglang/pull/1139
* Improve benchmark by merrymercy in https://github.com/sgl-project/sglang/pull/1140
* Fix duplicated imports in hf_transformers_utils.py by merrymercy in https://github.com/sgl-project/sglang/pull/1141
* fixed a typo by min-xu-et in https://github.com/sgl-project/sglang/pull/1143
* [Docs] Add instruction for running on clouds and kubernetes with SkyPilot by Michaelvll in https://github.com/sgl-project/sglang/pull/1144
* [Feat]Add support for optional start len of logprobs by yichuan520030910320 in https://github.com/sgl-project/sglang/pull/1035
* Optimize MLA/GQA/MQA Triton decoding by ispobock in https://github.com/sgl-project/sglang/pull/1138
* feat: allow streaming for multi-prompt and/or parallel sampling by vhain in https://github.com/sgl-project/sglang/pull/1134
* Improve docs and warnings by merrymercy in https://github.com/sgl-project/sglang/pull/1164
* [Feature] add disable-custom-all-reduce by Xu-Chen in https://github.com/sgl-project/sglang/pull/1148
* misc: add hypervisor vendor by zhyncs in https://github.com/sgl-project/sglang/pull/1165
* support /v1/health using a generation 1 token by LucienShui in https://github.com/sgl-project/sglang/pull/1154
* fix: resolve README render by zhyncs in https://github.com/sgl-project/sglang/pull/1166
* [Feat] Support update weights without restart server by shanyu-sys in https://github.com/sgl-project/sglang/pull/1157
* Improve multi-node stability by merrymercy in https://github.com/sgl-project/sglang/pull/1171
* fix: custom op fallback forward native when lower sm80 by zhyncs in https://github.com/sgl-project/sglang/pull/1177
* [Feature] Add a function to convert sampling_params to kwargs by gryffindor-rr in https://github.com/sgl-project/sglang/pull/1170
* Support min-p sampling by intervitens in https://github.com/sgl-project/sglang/pull/1167
* [Docs] Fix rendering of details in README by Michaelvll in https://github.com/sgl-project/sglang/pull/1179
* Improve code style of sampler by hnyls2002 in https://github.com/sgl-project/sglang/pull/1168
* [Minor] Improve logging and rename the health check endpoint name by merrymercy in https://github.com/sgl-project/sglang/pull/1180
* Fix broken penalty by hnyls2002 in https://github.com/sgl-project/sglang/pull/1184
* Fix benchmark script by Ying1123 in https://github.com/sgl-project/sglang/pull/1185
* [Feat] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. by kcz358 in https://github.com/sgl-project/sglang/pull/1123
* feat: use gelu_tanh_and_mul by zhyncs in https://github.com/sgl-project/sglang/pull/1193
* Cleanup readme, llava examples, usage examples and nccl init by merrymercy in https://github.com/sgl-project/sglang/pull/1194
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1198
* [CI] Fix the problem of hf runner too slow by Ying1123 in https://github.com/sgl-project/sglang/pull/1202
* [Fix] the issue of random order when input is a list by Ying1123 in https://github.com/sgl-project/sglang/pull/1199
* Relax the assert in moe throughput test to fix the flaky CI by merrymercy in https://github.com/sgl-project/sglang/pull/1207
* [Fix] Fixing the multi-images error for llava-onevision by kcz358 in https://github.com/sgl-project/sglang/pull/1205
* Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1186
* [Minor] Improve the function organization in TokenizerManager & improve loggers by merrymercy in https://github.com/sgl-project/sglang/pull/1208
* [Minor] Temporarily skip flaky test by Ying1123 in https://github.com/sgl-project/sglang/pull/1209
* [CI] Fix the issue of unit test hanging by Ying1123 in https://github.com/sgl-project/sglang/pull/1211
* Update CI workflows by merrymercy in https://github.com/sgl-project/sglang/pull/1210
* Update CI runner docs by merrymercy in https://github.com/sgl-project/sglang/pull/1213
* [Feature] Support fp8 e5m2 kv cache with flashinfer by ispobock in https://github.com/sgl-project/sglang/pull/1204
* Update workflow files by merrymercy in https://github.com/sgl-project/sglang/pull/1214
* improve the threshold and ports in tests by wisclmy0611 in https://github.com/sgl-project/sglang/pull/1215
* [CI] Fix CI by wisclmy0611 in https://github.com/sgl-project/sglang/pull/1217
* [Fix] Multi-images loading error by kcz358 in https://github.com/sgl-project/sglang/pull/1218
* [Minor] improve CI and dependencies by hnyls2002 in https://github.com/sgl-project/sglang/pull/1212
* [CI] Parallelize unit tests in CI by wisclmy0611 in https://github.com/sgl-project/sglang/pull/1219
* Move sampler into CUDA graph by hnyls2002 in https://github.com/sgl-project/sglang/pull/1201
* chore: bump v0.2.14 by zhyncs in https://github.com/sgl-project/sglang/pull/1155
* [FEAT] JSON constrained support by havetc in https://github.com/sgl-project/sglang/pull/1125
* Torch compile CI throughput test by hnyls2002 in https://github.com/sgl-project/sglang/pull/1223
* [FEAT] Support batches cancel by caiyueliang in https://github.com/sgl-project/sglang/pull/1222
* [Minor] add delete test and delete tmp file on ci server by yichuan520030910320 in https://github.com/sgl-project/sglang/pull/1227
* [FIX] Wrong logger by havetc in https://github.com/sgl-project/sglang/pull/1230
* feat: replace get_act_fn for gpt_bigcode by zhyncs in https://github.com/sgl-project/sglang/pull/1231
* Fix readme by ArtificialZeng in https://github.com/sgl-project/sglang/pull/1236
* Fix bench latency benchmark by hnyls2002 in https://github.com/sgl-project/sglang/pull/1225
* [Minor] Add more type annotations by merrymercy in https://github.com/sgl-project/sglang/pull/1237
* feat: support sm75 with FlashInfer v0.1.6 by zhyncs in https://github.com/sgl-project/sglang/pull/1233
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1239
* hotfix: revert sampler CUDA Graph by zhyncs in https://github.com/sgl-project/sglang/pull/1242
* Add sglang.bench_latency to CI by merrymercy in https://github.com/sgl-project/sglang/pull/1243
* fix: increase max_new_tokens when testing generation models by zhyncs in https://github.com/sgl-project/sglang/pull/1244
* feat: update GemmaRMSNorm by zhyncs in https://github.com/sgl-project/sglang/pull/1232
* Fix llava on multi images by merrymercy in https://github.com/sgl-project/sglang/pull/1247
* feat: replace GeluAndMul by zhyncs in https://github.com/sgl-project/sglang/pull/1234
* fix: resolve qwen2 moe weight loader by zhyncs in https://github.com/sgl-project/sglang/pull/1252
* chore: bump v0.2.14.post2 by zhyncs in https://github.com/sgl-project/sglang/pull/1250
* make json_schema usable from gen by qeternity in https://github.com/sgl-project/sglang/pull/1254
* fix data racing due to mutable reference using deepcopy by xiezhq-hermann in https://github.com/sgl-project/sglang/pull/1255
* Sampler cudagraph by hnyls2002 in https://github.com/sgl-project/sglang/pull/1253
* fix: multimodal_config in monkey_patch_vllm_dummy_weight_loader by lxww302 in https://github.com/sgl-project/sglang/pull/1260
* Transpose mla weight offline by ispobock in https://github.com/sgl-project/sglang/pull/1261
* EXAONE 3.0 Model Support by Deepfocused in https://github.com/sgl-project/sglang/pull/1258
* Update README Support Exaone 3.0 by Deepfocused in https://github.com/sgl-project/sglang/pull/1267
* Report median instead of mean in bench_latency.py by merrymercy in https://github.com/sgl-project/sglang/pull/1269
* Allow more flexible assistant and system response by BabyChouSr in https://github.com/sgl-project/sglang/pull/1256
* fix: resolve the fp8 bug introduced by vLLM 0.5.5 by zhyncs in https://github.com/sgl-project/sglang/pull/1276
* [doc] fix quick start link by ByronHsu in https://github.com/sgl-project/sglang/pull/1282
* Optimize the update flashinfer indices by xiaobochen123 in https://github.com/sgl-project/sglang/pull/1262
* [CI] Add more multi-gpu tests by merrymercy in https://github.com/sgl-project/sglang/pull/1280
* feat: fix fp8 for MLA and support bmm fp8 for DeepSeek V2 by zhyncs in https://github.com/sgl-project/sglang/pull/1285
* [CI] merge all ci tests into one file by merrymercy in https://github.com/sgl-project/sglang/pull/1289
* Support Triton fp8 e5m2 kv cache by ispobock in https://github.com/sgl-project/sglang/pull/1286
* [triton] Remove the zero initialization of qk_acc by directly writing the result by ByronHsu in https://github.com/sgl-project/sglang/pull/1288
* [Chore] Rename model_overide_args to model_override_args by kevin85421 in https://github.com/sgl-project/sglang/pull/1284
* Allow new lines during JSON generation by qeternity in https://github.com/sgl-project/sglang/pull/1277
* fix: resolve fp8 for mixtral by zhyncs in https://github.com/sgl-project/sglang/pull/1290
* ci: add nightly eval by zhyncs in https://github.com/sgl-project/sglang/pull/1291
* Fix the flaky tests in test_moe_eval_accuracy_large.py by merrymercy in https://github.com/sgl-project/sglang/pull/1293
* [doc] Fix more broken links by ByronHsu in https://github.com/sgl-project/sglang/pull/1294
* Fix regex mask by hnyls2002 in https://github.com/sgl-project/sglang/pull/1296
* Fix hang when doing s += None. by max99x in https://github.com/sgl-project/sglang/pull/1297

0.2.15

* feat: update nightly gsm8k eval by zhyncs in https://github.com/sgl-project/sglang/pull/1304
* Fix bugs in sampler with CUDA graph / torch.compile by hnyls2002 in https://github.com/sgl-project/sglang/pull/1306
* [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping by merrymercy in https://github.com/sgl-project/sglang/pull/1308
* Support Phi3 mini and medium by janimo in https://github.com/sgl-project/sglang/pull/1299
* Update README.md for llava-onevision instructions by merrymercy in https://github.com/sgl-project/sglang/pull/1313
* Fix llama2 weight loader by merrymercy in https://github.com/sgl-project/sglang/pull/1317
* Fix select by ensuring each request has at least one token by merrymercy in https://github.com/sgl-project/sglang/pull/1318
* misc: speedup load safetensors by zhyncs in https://github.com/sgl-project/sglang/pull/1319
* chore: bump v0.3.0 by zhyncs in https://github.com/sgl-project/sglang/pull/1320
* Fix the flaky test test_moe_eval_accuracy_large.py by merrymercy in https://github.com/sgl-project/sglang/pull/1326
* docs: update news by zhyncs in https://github.com/sgl-project/sglang/pull/1327

New Contributors
* Michaelvll made their first contribution in https://github.com/sgl-project/sglang/pull/1144
* Xu-Chen made their first contribution in https://github.com/sgl-project/sglang/pull/1148
* shanyu-sys made their first contribution in https://github.com/sgl-project/sglang/pull/1157
* intervitens made their first contribution in https://github.com/sgl-project/sglang/pull/1167
* zhaochenyang20 made their first contribution in https://github.com/sgl-project/sglang/pull/1186
* havetc made their first contribution in https://github.com/sgl-project/sglang/pull/1125
* caiyueliang made their first contribution in https://github.com/sgl-project/sglang/pull/1222
* ArtificialZeng made their first contribution in https://github.com/sgl-project/sglang/pull/1236
* lxww302 made their first contribution in https://github.com/sgl-project/sglang/pull/1260
* Deepfocused made their first contribution in https://github.com/sgl-project/sglang/pull/1258
* ByronHsu made their first contribution in https://github.com/sgl-project/sglang/pull/1282
* xiaobochen123 made their first contribution in https://github.com/sgl-project/sglang/pull/1262
* kevin85421 made their first contribution in https://github.com/sgl-project/sglang/pull/1284

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.2.13...v0.3.0

Page 4 of 7

Releases

Has known vulnerabilities

Previous Next

Sglang

Page 4 of 7

0.3.2

0.3.1.post2

0.3.1.post1

0.3.1

0.3.0

0.2.15

Page 4 of 7

Links

Releases