Sglang

Latest version: v0.3.6

Safety actively analyzes 682404 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

0.3.6

Highlights
* Reduce CPU overhead by enabling overlap scheduler by default. **1.1x higher throughput**. (2105, 2067, 2095)
* Support data parallelism for attention and MLA. 1.5x higher decoding throughput. (1970, 2061)
* Cache-aware load balancer. 4x higher cache hit rate (1934)
* Support xgrammar backend for grammar-guided decoding (2056)
* Support Prometheus metrics (1853, 1981)
* Support torch 2.5.1 (2069) and torch-native tensor parallelism (1876)
* Support graceful termination (1838) and watchdog (1816)
* Support notebook-style documentation (https://sgl-project.github.io/)
* Add an offline benchmark script (1968)
* Bug, deadlock, NaN, and OOM fixes (2083, 1850, 1800, 1779, 1789, 1858)
* New models: Phi3-small (2062), Gemma-2 reward model (1954), GPT-2 (1833)

What's Changed
* Fix edge case for truncated by ByronHsu in https://github.com/sgl-project/sglang/pull/1747
* Fuse more ops & Simplify token mapping by merrymercy in https://github.com/sgl-project/sglang/pull/1758
* [API] add get memory pool size by Ying1123 in https://github.com/sgl-project/sglang/pull/1760
* Fix perf regression for set_kv_buffer by merrymercy in https://github.com/sgl-project/sglang/pull/1765
* [Fix] Fix abort in data parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/1767
* Fix stop condition for <|eom_id|> by merrymercy in https://github.com/sgl-project/sglang/pull/1766
* Update docs by merrymercy in https://github.com/sgl-project/sglang/pull/1768
* Fix missing additional_stop_token_ids by merrymercy in https://github.com/sgl-project/sglang/pull/1769
* Fix out of memory message. by hnyls2002 in https://github.com/sgl-project/sglang/pull/1771
* Crash the server on warnings in CI by merrymercy in https://github.com/sgl-project/sglang/pull/1772
* Fix the perf regression due to additional_stop_token_ids by merrymercy in https://github.com/sgl-project/sglang/pull/1773
* Fix MockTokenizer in the unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1774
* [Bug] Catch any errors caused by parsing json schema by zolinthecow in https://github.com/sgl-project/sglang/pull/1776
* [Fix] Fix NaN issues by fixing the cuda graph padding values for flashinfer by merrymercy in https://github.com/sgl-project/sglang/pull/1779
* [Fix] Fix cuda graph padding for triton attention backend by merrymercy in https://github.com/sgl-project/sglang/pull/1782
* check user-specified model_max_len with hf derived max_model_len by BBuf in https://github.com/sgl-project/sglang/pull/1778
* Re-introduce `get_cuda_graph_seq_len_fill_value` by merrymercy in https://github.com/sgl-project/sglang/pull/1783
* Enhance the test case for chunked prefill and check memory leak by merrymercy in https://github.com/sgl-project/sglang/pull/1785
* Fix seq_lens_sum for cuda graph runner in padded cases by merrymercy in https://github.com/sgl-project/sglang/pull/1789
* Qwen2vl support cuda graph and disable radix cache by yizhang2077 in https://github.com/sgl-project/sglang/pull/1780
* Fix log parsing in the chunked prefill unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1793
* Fix memory leak when doing chunked prefill by hnyls2002 in https://github.com/sgl-project/sglang/pull/1787
* [Fix] Fix the log parsing in chunked prefill uni tests by merrymercy in https://github.com/sgl-project/sglang/pull/1794
* Revert "Fix memory leak when doing chunked prefill" by merrymercy in https://github.com/sgl-project/sglang/pull/1797
* Fix logprob in the overlapped mode by merrymercy in https://github.com/sgl-project/sglang/pull/1795

0.3.5.post2

* fix a small typo in docs by BBuf in https://github.com/sgl-project/sglang/pull/2047
* Fix core (MI300X) with --enable-overlap by HaiShaw in https://github.com/sgl-project/sglang/pull/2048
* Add Tensor Parallel to torch_native_llama by kwen2501 in https://github.com/sgl-project/sglang/pull/1876
* Add get_amdgpu_memory_capacity() by HaiShaw in https://github.com/sgl-project/sglang/pull/2049
* Fix weight update for data parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/2050
* Support DP MLA by ispobock in https://github.com/sgl-project/sglang/pull/1970
* Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data by merrymercy in https://github.com/sgl-project/sglang/pull/2051
* chore: update torch v2.5.1 by zhyncs in https://github.com/sgl-project/sglang/pull/1849
* Revert "chore: update torch v2.5.1" by merrymercy in https://github.com/sgl-project/sglang/pull/2063
* Remove monkey_patch_vllm_dummy_weight_loader by merrymercy in https://github.com/sgl-project/sglang/pull/2064
* Deprecate --disable-flashinfer and --disable-flashinfer-sampling by merrymercy in https://github.com/sgl-project/sglang/pull/2065
* Support cuda graph for DP attention by ispobock in https://github.com/sgl-project/sglang/pull/2061
* Rename arguments `--disable-nan-detection` to `--enable-nan-detection` by merrymercy in https://github.com/sgl-project/sglang/pull/2066
* [Performance] Update xgrammar-related constrained decoding by DarkSharpness in https://github.com/sgl-project/sglang/pull/2056
* add phi-3 small support by Tushar-ml in https://github.com/sgl-project/sglang/pull/2062
* [Minor] Fix styles for overlap mode by merrymercy in https://github.com/sgl-project/sglang/pull/2068
* Fix cuda illegal memory access in overlap mode by merrymercy in https://github.com/sgl-project/sglang/pull/2070
* Tune the threshold for accuracy tests in CI by merrymercy in https://github.com/sgl-project/sglang/pull/2071
* Crash the CI jobs on model import errors by merrymercy in https://github.com/sgl-project/sglang/pull/2072
* support set role as 'tool' by yukavio in https://github.com/sgl-project/sglang/pull/2075
* feat: update torch 2.5.1 by zhyncs in https://github.com/sgl-project/sglang/pull/2069
* Rename layer_idx to layer_id for consistency by janimo in https://github.com/sgl-project/sglang/pull/2078
* Fix chunked prefill with output logprob by merrymercy in https://github.com/sgl-project/sglang/pull/2083
* Allow passing extra request body to bench_offline_throughput.py by merrymercy in https://github.com/sgl-project/sglang/pull/2085
* Simplify logits penalizer by merrymercy in https://github.com/sgl-project/sglang/pull/2086
* Use cuda event wait and synchronization instead of busy waiting by merrymercy in https://github.com/sgl-project/sglang/pull/2089
* Fix: incorrect top_logprobs in chat completion by ajwaitz in https://github.com/sgl-project/sglang/pull/2088
* minor: update gsm8k eval by zhyncs in https://github.com/sgl-project/sglang/pull/2091
* Use native fp8 format on MI300X by HaiShaw in https://github.com/sgl-project/sglang/pull/2094
* minor: add dataset dump and questions shuffle by zhyncs in https://github.com/sgl-project/sglang/pull/2093
* Make constrained decoding work for overlap scheduler by merrymercy in https://github.com/sgl-project/sglang/pull/2095
* Set schedule policy more conservative for DP attention by ispobock in https://github.com/sgl-project/sglang/pull/2096
* Enable overlap by default by merrymercy in https://github.com/sgl-project/sglang/pull/2067
* Update nightly-eval.yml by merrymercy in https://github.com/sgl-project/sglang/pull/2100
* [feat] Add session control by Ying1123 in https://github.com/sgl-project/sglang/pull/2073
* Allow skipping warmup in bench_offline_throughput.py by merrymercy in https://github.com/sgl-project/sglang/pull/2103
* Move test_session_id.py to playground by merrymercy in https://github.com/sgl-project/sglang/pull/2104
* Enable overlap scheduler by default for the triton attention backend by merrymercy in https://github.com/sgl-project/sglang/pull/2105
* Error out when torchao-config option is not recognized by jerryzh168 in https://github.com/sgl-project/sglang/pull/2107
* Turn off autotune for scaled mm for fp8 dynamic quant in torchao by jerryzh168 in https://github.com/sgl-project/sglang/pull/2116
* ROCm: Fix MoE padding for none FP8 cases by HaiShaw in https://github.com/sgl-project/sglang/pull/2111
* Add support for Qwen2-VL-based embedding models by james-p-xu in https://github.com/sgl-project/sglang/pull/2055
* [router] add base_gpu_id server args & merged radix tree python reference by ByronHsu in https://github.com/sgl-project/sglang/pull/2115
* Fix 2037 - Context length check does not take into out pad tokens for visual models by jakep-allenai in https://github.com/sgl-project/sglang/pull/2106
* Rename sglang.bench_latency to sglang.bench_one_batch by merrymercy in https://github.com/sgl-project/sglang/pull/2118
* Benchmark with Pytorch Profiler easily by bjmsong in https://github.com/sgl-project/sglang/pull/2110
* [minor] Clean up unused imports by merrymercy in https://github.com/sgl-project/sglang/pull/2122
* minor: update gsm8k threshold by zhyncs in https://github.com/sgl-project/sglang/pull/2125
* chore: bump v0.3.6 by zhyncs in https://github.com/sgl-project/sglang/pull/2120

New Contributors
* zolinthecow made their first contribution in https://github.com/sgl-project/sglang/pull/1776
* BBuf made their first contribution in https://github.com/sgl-project/sglang/pull/1778
* DarkSharpness made their first contribution in https://github.com/sgl-project/sglang/pull/1752
* hliuca made their first contribution in https://github.com/sgl-project/sglang/pull/1799
* liuyanyi made their first contribution in https://github.com/sgl-project/sglang/pull/1823
* DanielC12321 made their first contribution in https://github.com/sgl-project/sglang/pull/1833
* geeker-smallwhite made their first contribution in https://github.com/sgl-project/sglang/pull/1855
* yichiche made their first contribution in https://github.com/sgl-project/sglang/pull/1871
* inakineitor made their first contribution in https://github.com/sgl-project/sglang/pull/1902
* Lzhang-hub made their first contribution in https://github.com/sgl-project/sglang/pull/1853
* XuehaiPan made their first contribution in https://github.com/sgl-project/sglang/pull/1926
* austin362667 made their first contribution in https://github.com/sgl-project/sglang/pull/1891
* binarycrayon made their first contribution in https://github.com/sgl-project/sglang/pull/1933
* aqweteddy made their first contribution in https://github.com/sgl-project/sglang/pull/1954
* leishaoSC made their first contribution in https://github.com/sgl-project/sglang/pull/1966
* kursataktas made their first contribution in https://github.com/sgl-project/sglang/pull/1745
* HuanzhiMao made their first contribution in https://github.com/sgl-project/sglang/pull/1982
* james-p-xu made their first contribution in https://github.com/sgl-project/sglang/pull/1995
* RangiLyu made their first contribution in https://github.com/sgl-project/sglang/pull/1994
* chottolabs made their first contribution in https://github.com/sgl-project/sglang/pull/2026
* ethe made their first contribution in https://github.com/sgl-project/sglang/pull/2028
* w1ndseeker made their first contribution in https://github.com/sgl-project/sglang/pull/2038
* kwen2501 made their first contribution in https://github.com/sgl-project/sglang/pull/1876
* Tushar-ml made their first contribution in https://github.com/sgl-project/sglang/pull/2062
* yukavio made their first contribution in https://github.com/sgl-project/sglang/pull/2075
* ajwaitz made their first contribution in https://github.com/sgl-project/sglang/pull/2088
* jakep-allenai made their first contribution in https://github.com/sgl-project/sglang/pull/2106
* bjmsong made their first contribution in https://github.com/sgl-project/sglang/pull/2110

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.3.4.post1...v0.3.6

0.3.5.post1

* Do not let invalid grammar crash the server by merrymercy in https://github.com/sgl-project/sglang/pull/2023
* Fix dependency and error message for xgrammar by merrymercy in https://github.com/sgl-project/sglang/pull/2024
* set content to empty string by chottolabs in https://github.com/sgl-project/sglang/pull/2026
* chore: open lto and optimization in release profile by ethe in https://github.com/sgl-project/sglang/pull/2028
* Add download_dir ServerArgs property by pjyi2147 in https://github.com/sgl-project/sglang/pull/2027
* Github runner instructions for AMD by HaiShaw in https://github.com/sgl-project/sglang/pull/2031
* Fix torch.compile for MoE by merrymercy in https://github.com/sgl-project/sglang/pull/2033
* Fix unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/2034
* Fix outlines version by merrymercy in https://github.com/sgl-project/sglang/pull/2036
* Expose no_stop_trim and skip_special_tokens in openai api by merrymercy in https://github.com/sgl-project/sglang/pull/2039
* Offline LLM Engine Benchmark Throughput by zolinthecow in https://github.com/sgl-project/sglang/pull/1968
* fix: align enable_overlap_scheduler naming between code and docs by w1ndseeker in https://github.com/sgl-project/sglang/pull/2038
* Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/2042
* benchmark json schema by DarkSharpness in https://github.com/sgl-project/sglang/pull/2030
* Fix json benchmark by merrymercy in https://github.com/sgl-project/sglang/pull/2043
* [Fix] Adjust default chunked prefill size and cuda graph max bs according to GPU memory capacity by merrymercy in https://github.com/sgl-project/sglang/pull/2044

0.3.5

* Fix regex docs by merrymercy in https://github.com/sgl-project/sglang/pull/1909
* Add Reward API Docs etc by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1910
* [Docs, ROCm] update install to cover ROCm with MI GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1915
* [router] Impl radix tree and set up CI by ByronHsu in https://github.com/sgl-project/sglang/pull/1893
* Update CODEOWNERS by ByronHsu in https://github.com/sgl-project/sglang/pull/1916
* Change judge to classify & Modify make file by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1920
* [Doc] improve relative links and structure by merrymercy in https://github.com/sgl-project/sglang/pull/1924
* support prometheus metrics by Lzhang-hub in https://github.com/sgl-project/sglang/pull/1853
* [rust] refactor server and router by ByronHsu in https://github.com/sgl-project/sglang/pull/1922
* minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces by XuehaiPan in https://github.com/sgl-project/sglang/pull/1926
* Add Rust Router Python Binding by austin362667 in https://github.com/sgl-project/sglang/pull/1891
* [Docs] fix 404 - Contributor Guide by HaiShaw in https://github.com/sgl-project/sglang/pull/1942
* fix black in pre-commit by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1940
* [Doc] fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1949
* [Performance, Triton Kernel Args] extend_attention, optimize kern args to _fwd_kernel by HaiShaw in https://github.com/sgl-project/sglang/pull/1941
* [ENV, ROCm] update environment settings by HaiShaw in https://github.com/sgl-project/sglang/pull/1939
* Add a timeout for execute-notebook.yml by merrymercy in https://github.com/sgl-project/sglang/pull/1951
* Update setup_github_runner.md by merrymercy in https://github.com/sgl-project/sglang/pull/1952
* Monitoring documentation by binarycrayon in https://github.com/sgl-project/sglang/pull/1933
* Gemma2 reward model support by aqweteddy in https://github.com/sgl-project/sglang/pull/1954
* Remove the useless to_srt_kwargs by merrymercy in https://github.com/sgl-project/sglang/pull/1955
* Adjust reward model's score module and pooler module order for reducing computation by aqweteddy in https://github.com/sgl-project/sglang/pull/1956
* [Release, ROCm] release ROCm docker build for AMD MI GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1957
* Add sentence_transformers to CI dependency by merrymercy in https://github.com/sgl-project/sglang/pull/1958
* [minor] Improve code style and compatibility by merrymercy in https://github.com/sgl-project/sglang/pull/1961
* Update README.md's Slack invitation link by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1962
* Updated Instructions on Profiling SGLang Infer System with AMD GPUs by leishaoSC in https://github.com/sgl-project/sglang/pull/1966
* Fix metrics by binarycrayon in https://github.com/sgl-project/sglang/pull/1963
* Initialize model_worker_batch variable by qeternity in https://github.com/sgl-project/sglang/pull/1973
* Introducing SGLang Guru on Gurubase.io by kursataktas in https://github.com/sgl-project/sglang/pull/1745
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1974
* Update pr-test-rust.yml to add a "finish" step by merrymercy in https://github.com/sgl-project/sglang/pull/1975
* [Minor] Fix a typo in test_torchao.py by merrymercy in https://github.com/sgl-project/sglang/pull/1976
* Clean up metrics code by merrymercy in https://github.com/sgl-project/sglang/pull/1972
* [CI] balance unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1977
* Specify `zmq` Version Requirement by HuanzhiMao in https://github.com/sgl-project/sglang/pull/1982
* Simplify prometheus metrics by merrymercy in https://github.com/sgl-project/sglang/pull/1981
* fix: update pyzmq version by zhyncs in https://github.com/sgl-project/sglang/pull/1983
* docs: add shm size for docker run by zhyncs in https://github.com/sgl-project/sglang/pull/1986
* qwen2vl fix bug for 1971 1897 by yizhang2077 in https://github.com/sgl-project/sglang/pull/1984
* [CI] Balance unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1988
* Add gen-shared-prefix dataset in bench_serving by ByronHsu in https://github.com/sgl-project/sglang/pull/1990
* [Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel by HaiShaw in https://github.com/sgl-project/sglang/pull/1980
* [rust] cache-aware DP - approx tree by ByronHsu in https://github.com/sgl-project/sglang/pull/1934
* docs: add slides link in README by zhyncs in https://github.com/sgl-project/sglang/pull/1997
* Add engine encode by james-p-xu in https://github.com/sgl-project/sglang/pull/1995
* setup router python binding ci by ByronHsu in https://github.com/sgl-project/sglang/pull/1999
* Add Engine::encode example by james-p-xu in https://github.com/sgl-project/sglang/pull/2000
* Fix rust unit test and pypi token by ByronHsu in https://github.com/sgl-project/sglang/pull/2001
* release router from py38 to py312 by ByronHsu in https://github.com/sgl-project/sglang/pull/2002
* Bump router to 0.0.3 by ByronHsu in https://github.com/sgl-project/sglang/pull/2004
* run rust test on ubuntu instead of 1-gpu-runner by ByronHsu in https://github.com/sgl-project/sglang/pull/2003
* support internlm2-reward by RangiLyu in https://github.com/sgl-project/sglang/pull/1994
* fix sglang_router not found by ByronHsu in https://github.com/sgl-project/sglang/pull/2005
* [Minor] Remove unused imports by merrymercy in https://github.com/sgl-project/sglang/pull/2006
* Fix a typo in io_struct.py by merrymercy in https://github.com/sgl-project/sglang/pull/2008
* Fix weight loading for tied word embedding when TP > 1 by merrymercy in https://github.com/sgl-project/sglang/pull/2009
* cleanup rust folder by ByronHsu in https://github.com/sgl-project/sglang/pull/2010
* Filter empty prompt in random bench serving by ispobock in https://github.com/sgl-project/sglang/pull/2011
* support echo=true and logprobs in openai api when logprobs=1 in lm-evaluation-harness by BBuf in https://github.com/sgl-project/sglang/pull/1998
* Fix finish reason by merrymercy in https://github.com/sgl-project/sglang/pull/2013
* fix a bug in v1_embeeding_request by BBuf in https://github.com/sgl-project/sglang/pull/2014
* fix test_embedding_models prompt length too long's bug by BBuf in https://github.com/sgl-project/sglang/pull/2015
* support parallel grammar preprocessing by DarkSharpness in https://github.com/sgl-project/sglang/pull/1996
* Refactor grammar backend by merrymercy in https://github.com/sgl-project/sglang/pull/2018
* Fix grammar backend for tensor parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/2020

0.3.4.post2

* [Performance] Support both xgrammar and outlines for constrained decoding by DarkSharpness in https://github.com/sgl-project/sglang/pull/1752
* [Fix] Fix --skip-tokenizer-init by merrymercy in https://github.com/sgl-project/sglang/pull/1798
* move max_position_embeddings to the last by hliuca in https://github.com/sgl-project/sglang/pull/1799
* add support for ipynb by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1786
* Fix possible ZMQ hanging by hnyls2002 in https://github.com/sgl-project/sglang/pull/1800
* Set `ZMQ` buffer size heuristic by hnyls2002 in https://github.com/sgl-project/sglang/pull/1801
* Allow consecutive ports when launching multiple sglang servers. by hnyls2002 in https://github.com/sgl-project/sglang/pull/1802
* fix int conversion for `SGLANG_CPU_COUNT` by ByronHsu in https://github.com/sgl-project/sglang/pull/1803
* Update ci workflows by merrymercy in https://github.com/sgl-project/sglang/pull/1804
* Update links by merrymercy in https://github.com/sgl-project/sglang/pull/1805
* Simplify our docs with complicated functions into utils by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1807
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1808
* Provide an argument to set the maximum batch size for cuda graph by merrymercy in https://github.com/sgl-project/sglang/pull/1809
* Improve the user control of new_token_ratio by merrymercy in https://github.com/sgl-project/sglang/pull/1811
* Update hyperparameter_tuning.md by merrymercy in https://github.com/sgl-project/sglang/pull/1813
* Add a watch dog thread by merrymercy in https://github.com/sgl-project/sglang/pull/1816
* Fix unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1817
* Add openAI compatible API by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1810
* Fix Triton decode kernel & ut by ispobock in https://github.com/sgl-project/sglang/pull/1819
* support token ids in `engine.generate` by ByronHsu in https://github.com/sgl-project/sglang/pull/1820
* Fix docs deploy ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1821
* [router] rust-based router by ByronHsu in https://github.com/sgl-project/sglang/pull/1790
* Fix update_weights deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1825
* fix get_memory_pool_size deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1830
* Support setting `use_thread` in the `run_program` for easier debugging. by liuyanyi in https://github.com/sgl-project/sglang/pull/1823
* [3rdparty, document] Add 3rdparty/amd, with profiling and tuning instructions to be added by HaiShaw in https://github.com/sgl-project/sglang/pull/1822
* stop_str of qwen2-vl template should be a tuple not a str by yizhang2077 in https://github.com/sgl-project/sglang/pull/1834
* [FP8 KV Cache, Mixtral] Avoid KeyError at loading pre-quantized FP8 m… by HaiShaw in https://github.com/sgl-project/sglang/pull/1835
* Gpt2 by DanielC12321 in https://github.com/sgl-project/sglang/pull/1833
* Imporve openai api documents by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1827
* Update docs by merrymercy in https://github.com/sgl-project/sglang/pull/1839
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1840
* [Production] Drain requests before exit when receive SIGTERM by Ying1123 in https://github.com/sgl-project/sglang/pull/1838
* [Performance, Hardware] MoE weights padding to AMD MI300x GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1836
* Fix suggest edit by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1842
* [Performance, Triton Kernel Args] _decode_grouped_softmax_reducev_fwd… by HaiShaw in https://github.com/sgl-project/sglang/pull/1845
* Make decode log interval configurable by ByronHsu in https://github.com/sgl-project/sglang/pull/1847
* Fix mixed chunked prefill by merrymercy in https://github.com/sgl-project/sglang/pull/1850
* Refactor tokenizer manager by ByronHsu in https://github.com/sgl-project/sglang/pull/1846
* Simplify documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1851
* Fix warnings in doc build by merrymercy in https://github.com/sgl-project/sglang/pull/1852
* delete unused character by geeker-smallwhite in https://github.com/sgl-project/sglang/pull/1855
* Fix memory leak for chunked prefill 2 by merrymercy in https://github.com/sgl-project/sglang/pull/1858
* [Build, ROCm] Dockerfile.rocm for Instinct GPUs, with package updates by HaiShaw in https://github.com/sgl-project/sglang/pull/1861
* Fix retraction + overlap by hnyls2002 in https://github.com/sgl-project/sglang/pull/1860
* change file tree by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1859
* Update vocab embedding deps and add TP switch by ispobock in https://github.com/sgl-project/sglang/pull/1856
* minor: add human eval by zhyncs in https://github.com/sgl-project/sglang/pull/1754
* Add vlm document by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1866
* minor: update nightly eval by zhyncs in https://github.com/sgl-project/sglang/pull/1867
* [3rdparty, document] Updated Documentation that covers performance tuning techniques for AMD Instinct GPUs. by yichiche in https://github.com/sgl-project/sglang/pull/1871
* Improve docs and fix the broken links by merrymercy in https://github.com/sgl-project/sglang/pull/1875
* Add a FAQ documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1877
* Update docs title by merrymercy in https://github.com/sgl-project/sglang/pull/1879
* Update docs and workflow by merrymercy in https://github.com/sgl-project/sglang/pull/1881
* Fix doc links by merrymercy in https://github.com/sgl-project/sglang/pull/1882
* Fix incorrect context length for llama3.2-11b by rchen19 in https://github.com/sgl-project/sglang/pull/1873
* add native api docs by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1883
* Update index.rst to improve the order of docs by merrymercy in https://github.com/sgl-project/sglang/pull/1885
* Native api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1886
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1889
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1888
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1890
* Fix ci and link error by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1892
* Add engine api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1894
* turn off log for the offline engine by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1895
* Do not use longest prefix matching when queue-req is large by merrymercy in https://github.com/sgl-project/sglang/pull/1896
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1899
* Allow passing dtype and max_new_tokens to HF reference script by janimo in https://github.com/sgl-project/sglang/pull/1903
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1904
* Unify the model type checking by merrymercy in https://github.com/sgl-project/sglang/pull/1905
* Escape backwards slash by inakineitor in https://github.com/sgl-project/sglang/pull/1902
* feat: support truss endpoint for benchmark serving by zhyncs in https://github.com/sgl-project/sglang/pull/1906
* Let reward model take text inputs instead of message lists by merrymercy in https://github.com/sgl-project/sglang/pull/1907

0.3.4.post1

New Contributors
* du00cs made their first contribution in https://github.com/sgl-project/sglang/pull/1521
* KylinMountain made their first contribution in https://github.com/sgl-project/sglang/pull/1520
* jeffrey-fong made their first contribution in https://github.com/sgl-project/sglang/pull/1495
* cauyxy made their first contribution in https://github.com/sgl-project/sglang/pull/1537
* kkHuang-amd made their first contribution in https://github.com/sgl-project/sglang/pull/1554
* tbarton16 made their first contribution in https://github.com/sgl-project/sglang/pull/1553
* mssongit made their first contribution in https://github.com/sgl-project/sglang/pull/1536
* FredericOdermatt made their first contribution in https://github.com/sgl-project/sglang/pull/1569
* kushal34712 made their first contribution in https://github.com/sgl-project/sglang/pull/1625
* liangan1 made their first contribution in https://github.com/sgl-project/sglang/pull/1607
* glen-amd made their first contribution in https://github.com/sgl-project/sglang/pull/1611
* OBJECT907 made their first contribution in https://github.com/sgl-project/sglang/pull/1579
* abatom made their first contribution in https://github.com/sgl-project/sglang/pull/1626
* JanumalaAkhilendra made their first contribution in https://github.com/sgl-project/sglang/pull/1633
* learninmou made their first contribution in https://github.com/sgl-project/sglang/pull/1642
* pjyi2147 made their first contribution in https://github.com/sgl-project/sglang/pull/1653
* andy-yang-1 made their first contribution in https://github.com/sgl-project/sglang/pull/1459
* michaelfeil made their first contribution in https://github.com/sgl-project/sglang/pull/1688
* zeng-zc made their first contribution in https://github.com/sgl-project/sglang/pull/1679
* wxsms made their first contribution in https://github.com/sgl-project/sglang/pull/1697
* g-drozdov made their first contribution in https://github.com/sgl-project/sglang/pull/1684
* sixsixcoder made their first contribution in https://github.com/sgl-project/sglang/pull/1736

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.3.2...v0.3.4.post1

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.