Sglang

Latest version: v0.4.4.post3

Safety actively analyzes 724206 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

0.3.5

* Fix regex docs by merrymercy in https://github.com/sgl-project/sglang/pull/1909
* Add Reward API Docs etc by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1910
* [Docs, ROCm] update install to cover ROCm with MI GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1915
* [router] Impl radix tree and set up CI by ByronHsu in https://github.com/sgl-project/sglang/pull/1893
* Update CODEOWNERS by ByronHsu in https://github.com/sgl-project/sglang/pull/1916
* Change judge to classify & Modify make file by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1920
* [Doc] improve relative links and structure by merrymercy in https://github.com/sgl-project/sglang/pull/1924
* support prometheus metrics by Lzhang-hub in https://github.com/sgl-project/sglang/pull/1853
* [rust] refactor server and router by ByronHsu in https://github.com/sgl-project/sglang/pull/1922
* minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces by XuehaiPan in https://github.com/sgl-project/sglang/pull/1926
* Add Rust Router Python Binding by austin362667 in https://github.com/sgl-project/sglang/pull/1891
* [Docs] fix 404 - Contributor Guide by HaiShaw in https://github.com/sgl-project/sglang/pull/1942
* fix black in pre-commit by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1940
* [Doc] fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1949
* [Performance, Triton Kernel Args] extend_attention, optimize kern args to _fwd_kernel by HaiShaw in https://github.com/sgl-project/sglang/pull/1941
* [ENV, ROCm] update environment settings by HaiShaw in https://github.com/sgl-project/sglang/pull/1939
* Add a timeout for execute-notebook.yml by merrymercy in https://github.com/sgl-project/sglang/pull/1951
* Update setup_github_runner.md by merrymercy in https://github.com/sgl-project/sglang/pull/1952
* Monitoring documentation by binarycrayon in https://github.com/sgl-project/sglang/pull/1933
* Gemma2 reward model support by aqweteddy in https://github.com/sgl-project/sglang/pull/1954
* Remove the useless to_srt_kwargs by merrymercy in https://github.com/sgl-project/sglang/pull/1955
* Adjust reward model's score module and pooler module order for reducing computation by aqweteddy in https://github.com/sgl-project/sglang/pull/1956
* [Release, ROCm] release ROCm docker build for AMD MI GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1957
* Add sentence_transformers to CI dependency by merrymercy in https://github.com/sgl-project/sglang/pull/1958
* [minor] Improve code style and compatibility by merrymercy in https://github.com/sgl-project/sglang/pull/1961
* Update README.md's Slack invitation link by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1962
* Updated Instructions on Profiling SGLang Infer System with AMD GPUs by leishaoSC in https://github.com/sgl-project/sglang/pull/1966
* Fix metrics by binarycrayon in https://github.com/sgl-project/sglang/pull/1963
* Initialize model_worker_batch variable by qeternity in https://github.com/sgl-project/sglang/pull/1973
* Introducing SGLang Guru on Gurubase.io by kursataktas in https://github.com/sgl-project/sglang/pull/1745
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1974
* Update pr-test-rust.yml to add a "finish" step by merrymercy in https://github.com/sgl-project/sglang/pull/1975
* [Minor] Fix a typo in test_torchao.py by merrymercy in https://github.com/sgl-project/sglang/pull/1976
* Clean up metrics code by merrymercy in https://github.com/sgl-project/sglang/pull/1972
* [CI] balance unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1977
* Specify `zmq` Version Requirement by HuanzhiMao in https://github.com/sgl-project/sglang/pull/1982
* Simplify prometheus metrics by merrymercy in https://github.com/sgl-project/sglang/pull/1981
* fix: update pyzmq version by zhyncs in https://github.com/sgl-project/sglang/pull/1983
* docs: add shm size for docker run by zhyncs in https://github.com/sgl-project/sglang/pull/1986
* qwen2vl fix bug for 1971 1897 by yizhang2077 in https://github.com/sgl-project/sglang/pull/1984
* [CI] Balance unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1988
* Add gen-shared-prefix dataset in bench_serving by ByronHsu in https://github.com/sgl-project/sglang/pull/1990
* [Performance, Triton] Optimize over mask compute to tl.load in fused_moe_kernel by HaiShaw in https://github.com/sgl-project/sglang/pull/1980
* [rust] cache-aware DP - approx tree by ByronHsu in https://github.com/sgl-project/sglang/pull/1934
* docs: add slides link in README by zhyncs in https://github.com/sgl-project/sglang/pull/1997
* Add engine encode by james-p-xu in https://github.com/sgl-project/sglang/pull/1995
* setup router python binding ci by ByronHsu in https://github.com/sgl-project/sglang/pull/1999
* Add Engine::encode example by james-p-xu in https://github.com/sgl-project/sglang/pull/2000
* Fix rust unit test and pypi token by ByronHsu in https://github.com/sgl-project/sglang/pull/2001
* release router from py38 to py312 by ByronHsu in https://github.com/sgl-project/sglang/pull/2002
* Bump router to 0.0.3 by ByronHsu in https://github.com/sgl-project/sglang/pull/2004
* run rust test on ubuntu instead of 1-gpu-runner by ByronHsu in https://github.com/sgl-project/sglang/pull/2003
* support internlm2-reward by RangiLyu in https://github.com/sgl-project/sglang/pull/1994
* fix sglang_router not found by ByronHsu in https://github.com/sgl-project/sglang/pull/2005
* [Minor] Remove unused imports by merrymercy in https://github.com/sgl-project/sglang/pull/2006
* Fix a typo in io_struct.py by merrymercy in https://github.com/sgl-project/sglang/pull/2008
* Fix weight loading for tied word embedding when TP > 1 by merrymercy in https://github.com/sgl-project/sglang/pull/2009
* cleanup rust folder by ByronHsu in https://github.com/sgl-project/sglang/pull/2010
* Filter empty prompt in random bench serving by ispobock in https://github.com/sgl-project/sglang/pull/2011
* support echo=true and logprobs in openai api when logprobs=1 in lm-evaluation-harness by BBuf in https://github.com/sgl-project/sglang/pull/1998
* Fix finish reason by merrymercy in https://github.com/sgl-project/sglang/pull/2013
* fix a bug in v1_embeeding_request by BBuf in https://github.com/sgl-project/sglang/pull/2014
* fix test_embedding_models prompt length too long's bug by BBuf in https://github.com/sgl-project/sglang/pull/2015
* support parallel grammar preprocessing by DarkSharpness in https://github.com/sgl-project/sglang/pull/1996
* Refactor grammar backend by merrymercy in https://github.com/sgl-project/sglang/pull/2018
* Fix grammar backend for tensor parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/2020

0.3.4.post2

* [Performance] Support both xgrammar and outlines for constrained decoding by DarkSharpness in https://github.com/sgl-project/sglang/pull/1752
* [Fix] Fix --skip-tokenizer-init by merrymercy in https://github.com/sgl-project/sglang/pull/1798
* move max_position_embeddings to the last by hliuca in https://github.com/sgl-project/sglang/pull/1799
* add support for ipynb by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1786
* Fix possible ZMQ hanging by hnyls2002 in https://github.com/sgl-project/sglang/pull/1800
* Set `ZMQ` buffer size heuristic by hnyls2002 in https://github.com/sgl-project/sglang/pull/1801
* Allow consecutive ports when launching multiple sglang servers. by hnyls2002 in https://github.com/sgl-project/sglang/pull/1802
* fix int conversion for `SGLANG_CPU_COUNT` by ByronHsu in https://github.com/sgl-project/sglang/pull/1803
* Update ci workflows by merrymercy in https://github.com/sgl-project/sglang/pull/1804
* Update links by merrymercy in https://github.com/sgl-project/sglang/pull/1805
* Simplify our docs with complicated functions into utils by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1807
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1808
* Provide an argument to set the maximum batch size for cuda graph by merrymercy in https://github.com/sgl-project/sglang/pull/1809
* Improve the user control of new_token_ratio by merrymercy in https://github.com/sgl-project/sglang/pull/1811
* Update hyperparameter_tuning.md by merrymercy in https://github.com/sgl-project/sglang/pull/1813
* Add a watch dog thread by merrymercy in https://github.com/sgl-project/sglang/pull/1816
* Fix unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1817
* Add openAI compatible API by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1810
* Fix Triton decode kernel & ut by ispobock in https://github.com/sgl-project/sglang/pull/1819
* support token ids in `engine.generate` by ByronHsu in https://github.com/sgl-project/sglang/pull/1820
* Fix docs deploy ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1821
* [router] rust-based router by ByronHsu in https://github.com/sgl-project/sglang/pull/1790
* Fix update_weights deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1825
* fix get_memory_pool_size deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1830
* Support setting `use_thread` in the `run_program` for easier debugging. by liuyanyi in https://github.com/sgl-project/sglang/pull/1823
* [3rdparty, document] Add 3rdparty/amd, with profiling and tuning instructions to be added by HaiShaw in https://github.com/sgl-project/sglang/pull/1822
* stop_str of qwen2-vl template should be a tuple not a str by yizhang2077 in https://github.com/sgl-project/sglang/pull/1834
* [FP8 KV Cache, Mixtral] Avoid KeyError at loading pre-quantized FP8 m… by HaiShaw in https://github.com/sgl-project/sglang/pull/1835
* Gpt2 by DanielC12321 in https://github.com/sgl-project/sglang/pull/1833
* Imporve openai api documents by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1827
* Update docs by merrymercy in https://github.com/sgl-project/sglang/pull/1839
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1840
* [Production] Drain requests before exit when receive SIGTERM by Ying1123 in https://github.com/sgl-project/sglang/pull/1838
* [Performance, Hardware] MoE weights padding to AMD MI300x GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1836
* Fix suggest edit by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1842
* [Performance, Triton Kernel Args] _decode_grouped_softmax_reducev_fwd… by HaiShaw in https://github.com/sgl-project/sglang/pull/1845
* Make decode log interval configurable by ByronHsu in https://github.com/sgl-project/sglang/pull/1847
* Fix mixed chunked prefill by merrymercy in https://github.com/sgl-project/sglang/pull/1850
* Refactor tokenizer manager by ByronHsu in https://github.com/sgl-project/sglang/pull/1846
* Simplify documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1851
* Fix warnings in doc build by merrymercy in https://github.com/sgl-project/sglang/pull/1852
* delete unused character by geeker-smallwhite in https://github.com/sgl-project/sglang/pull/1855
* Fix memory leak for chunked prefill 2 by merrymercy in https://github.com/sgl-project/sglang/pull/1858
* [Build, ROCm] Dockerfile.rocm for Instinct GPUs, with package updates by HaiShaw in https://github.com/sgl-project/sglang/pull/1861
* Fix retraction + overlap by hnyls2002 in https://github.com/sgl-project/sglang/pull/1860
* change file tree by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1859
* Update vocab embedding deps and add TP switch by ispobock in https://github.com/sgl-project/sglang/pull/1856
* minor: add human eval by zhyncs in https://github.com/sgl-project/sglang/pull/1754
* Add vlm document by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1866
* minor: update nightly eval by zhyncs in https://github.com/sgl-project/sglang/pull/1867
* [3rdparty, document] Updated Documentation that covers performance tuning techniques for AMD Instinct GPUs. by yichiche in https://github.com/sgl-project/sglang/pull/1871
* Improve docs and fix the broken links by merrymercy in https://github.com/sgl-project/sglang/pull/1875
* Add a FAQ documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1877
* Update docs title by merrymercy in https://github.com/sgl-project/sglang/pull/1879
* Update docs and workflow by merrymercy in https://github.com/sgl-project/sglang/pull/1881
* Fix doc links by merrymercy in https://github.com/sgl-project/sglang/pull/1882
* Fix incorrect context length for llama3.2-11b by rchen19 in https://github.com/sgl-project/sglang/pull/1873
* add native api docs by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1883
* Update index.rst to improve the order of docs by merrymercy in https://github.com/sgl-project/sglang/pull/1885
* Native api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1886
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1889
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1888
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1890
* Fix ci and link error by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1892
* Add engine api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1894
* turn off log for the offline engine by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1895
* Do not use longest prefix matching when queue-req is large by merrymercy in https://github.com/sgl-project/sglang/pull/1896
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1899
* Allow passing dtype and max_new_tokens to HF reference script by janimo in https://github.com/sgl-project/sglang/pull/1903
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1904
* Unify the model type checking by merrymercy in https://github.com/sgl-project/sglang/pull/1905
* Escape backwards slash by inakineitor in https://github.com/sgl-project/sglang/pull/1902
* feat: support truss endpoint for benchmark serving by zhyncs in https://github.com/sgl-project/sglang/pull/1906
* Let reward model take text inputs instead of message lists by merrymercy in https://github.com/sgl-project/sglang/pull/1907

0.3.4.post1

New Contributors
* du00cs made their first contribution in https://github.com/sgl-project/sglang/pull/1521
* KylinMountain made their first contribution in https://github.com/sgl-project/sglang/pull/1520
* jeffrey-fong made their first contribution in https://github.com/sgl-project/sglang/pull/1495
* cauyxy made their first contribution in https://github.com/sgl-project/sglang/pull/1537
* kkHuang-amd made their first contribution in https://github.com/sgl-project/sglang/pull/1554
* tbarton16 made their first contribution in https://github.com/sgl-project/sglang/pull/1553
* mssongit made their first contribution in https://github.com/sgl-project/sglang/pull/1536
* FredericOdermatt made their first contribution in https://github.com/sgl-project/sglang/pull/1569
* kushal34712 made their first contribution in https://github.com/sgl-project/sglang/pull/1625
* liangan1 made their first contribution in https://github.com/sgl-project/sglang/pull/1607
* glen-amd made their first contribution in https://github.com/sgl-project/sglang/pull/1611
* OBJECT907 made their first contribution in https://github.com/sgl-project/sglang/pull/1579
* abatom made their first contribution in https://github.com/sgl-project/sglang/pull/1626
* JanumalaAkhilendra made their first contribution in https://github.com/sgl-project/sglang/pull/1633
* learninmou made their first contribution in https://github.com/sgl-project/sglang/pull/1642
* pjyi2147 made their first contribution in https://github.com/sgl-project/sglang/pull/1653
* andy-yang-1 made their first contribution in https://github.com/sgl-project/sglang/pull/1459
* michaelfeil made their first contribution in https://github.com/sgl-project/sglang/pull/1688
* zeng-zc made their first contribution in https://github.com/sgl-project/sglang/pull/1679
* wxsms made their first contribution in https://github.com/sgl-project/sglang/pull/1697
* g-drozdov made their first contribution in https://github.com/sgl-project/sglang/pull/1684
* sixsixcoder made their first contribution in https://github.com/sgl-project/sglang/pull/1736

**Full Changelog**: https://github.com/sgl-project/sglang/compare/v0.3.2...v0.3.4.post1

0.3.4

* Simplify the interface of tp_worker by merrymercy in https://github.com/sgl-project/sglang/pull/1718
* Update vllm to 0.6.3 (1711) by zhyncs in https://github.com/sgl-project/sglang/pull/1720
* Support qwen2 vl model by zhyncs in https://github.com/sgl-project/sglang/pull/1721
* Update README.md by Ying1123 in https://github.com/sgl-project/sglang/pull/1722
* Unify the memory pool api and tp worker API by merrymercy in https://github.com/sgl-project/sglang/pull/1724
* Temporarily skip the test_mixed_batch for QWen2VL by merrymercy in https://github.com/sgl-project/sglang/pull/1725
* Split the overlapped version of TpModelWorkerClient into a separate file by merrymercy in https://github.com/sgl-project/sglang/pull/1726
* [Bugfix] qwen2vl forward_extend by yizhang2077 in https://github.com/sgl-project/sglang/pull/1727
* Simplify the usage of device by merrymercy in https://github.com/sgl-project/sglang/pull/1734
* Simplify batch result resolution by merrymercy in https://github.com/sgl-project/sglang/pull/1735
* Add GLM-4 TextGeneration Model support for SGLang by sixsixcoder in https://github.com/sgl-project/sglang/pull/1736
* Make token mapping non-blocking in the overlapped mode by merrymercy in https://github.com/sgl-project/sglang/pull/1740
* Maintain seq_lens_sum to make more FlashInfer operations non-blocking by merrymercy in https://github.com/sgl-project/sglang/pull/1741
* Fix prefill oom by hnyls2002 in https://github.com/sgl-project/sglang/pull/1743
* Faster overlap mode scheduler by merrymercy in https://github.com/sgl-project/sglang/pull/1738
* misc: add CODEOWNERS by zhyncs in https://github.com/sgl-project/sglang/pull/1737
* Fix sliding window attention and gemma-2 unit tests in CI by merrymercy in https://github.com/sgl-project/sglang/pull/1746
* Llama3.2 vision model support by hnyls2002 in https://github.com/sgl-project/sglang/pull/1551
* Update `max_req_len` and `max_req_input_len` by hnyls2002 in https://github.com/sgl-project/sglang/pull/1748

0.3.3.post1

* [engine] support async and streaming by ByronHsu in https://github.com/sgl-project/sglang/pull/1614
* [Fix] Fix the style of test_large_max_new_tokens.py by merrymercy in https://github.com/sgl-project/sglang/pull/1638
* fix missing ignore_eos in v1/chat/completions by learninmou in https://github.com/sgl-project/sglang/pull/1642
* Fix ignore_eos in the OpenAI ChatCompletions API by merrymercy in https://github.com/sgl-project/sglang/pull/1645
* [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch by liangan1 in https://github.com/sgl-project/sglang/pull/1480
* Fix unit tests and type annotations by merrymercy in https://github.com/sgl-project/sglang/pull/1648
* Add an option to disable penalizer by merrymercy in https://github.com/sgl-project/sglang/pull/1651
* Add get_tokenizer function for Engine class by pjyi2147 in https://github.com/sgl-project/sglang/pull/1653
* Fix the batch_is_full check for jump-forward decoding by merrymercy in https://github.com/sgl-project/sglang/pull/1654
* Simplify the event loop and expose `--num-continuous-decode-steps` as an argument by merrymercy in https://github.com/sgl-project/sglang/pull/1652
* [doc] Add engine section in backend.md by ByronHsu in https://github.com/sgl-project/sglang/pull/1656
* [Fix] fix eos trim inconsistency by Ying1123 in https://github.com/sgl-project/sglang/pull/1650
* Add output_ids into ScheduleBatch by merrymercy in https://github.com/sgl-project/sglang/pull/1659
* [Minor] Rename no_eos_trim to no_stop_trim by Ying1123 in https://github.com/sgl-project/sglang/pull/1661
* Add a test case to test retract by merrymercy in https://github.com/sgl-project/sglang/pull/1662
* Move filter_batch out of stream_output by merrymercy in https://github.com/sgl-project/sglang/pull/1663
* Support double sparsity by andy-yang-1 in https://github.com/sgl-project/sglang/pull/1459
* Fix unit test order to balance the tasks in CI by merrymercy in https://github.com/sgl-project/sglang/pull/1665
* [Minor] Improve style by merrymercy in https://github.com/sgl-project/sglang/pull/1666
* Simplify chunked prefill by merrymercy in https://github.com/sgl-project/sglang/pull/1667
* [1/N] Remove `CacheConfig` import in all model files by ByronHsu in https://github.com/sgl-project/sglang/pull/1658
* [doc] improve engine doc and add to readme by ByronHsu in https://github.com/sgl-project/sglang/pull/1670
* [Minor] Add some utility functions by merrymercy in https://github.com/sgl-project/sglang/pull/1671
* Improve benchmark scripts by merrymercy in https://github.com/sgl-project/sglang/pull/1672
* Fix memory leak during abort by merrymercy in https://github.com/sgl-project/sglang/pull/1674
* Fix filter_batch function call by hnyls2002 in https://github.com/sgl-project/sglang/pull/1681
* Add OLMo model by janimo in https://github.com/sgl-project/sglang/pull/1676
* Add a new event loop by merrymercy in https://github.com/sgl-project/sglang/pull/1677
* Fix srt dependency by ispobock in https://github.com/sgl-project/sglang/pull/1685
* [Event] Add online meetup meeting link by Ying1123 in https://github.com/sgl-project/sglang/pull/1686
* Launch a thread to overlap CPU and GPU by merrymercy in https://github.com/sgl-project/sglang/pull/1687
* Returning a per request metric for number of cached_tokens read by havetc in https://github.com/sgl-project/sglang/pull/1599
* add orjson for jsonresponse by michaelfeil in https://github.com/sgl-project/sglang/pull/1688
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1689
* Add date to logging messages (1623) by zeng-zc in https://github.com/sgl-project/sglang/pull/1679
* Update the transformers version in CI by merrymercy in https://github.com/sgl-project/sglang/pull/1690
* Use SGLang imports for linear layer by janimo in https://github.com/sgl-project/sglang/pull/1696
* feat: radix tree code optimize by wxsms in https://github.com/sgl-project/sglang/pull/1697
* ORJson. Faster Json serialization by michaelfeil in https://github.com/sgl-project/sglang/pull/1694
* Fix the failed unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1699
* Fix failed ci tests on long prompts; Better error messages for embedding models by merrymercy in https://github.com/sgl-project/sglang/pull/1700
* Fix engine unit test by merrymercy in https://github.com/sgl-project/sglang/pull/1701
* Fix mixed batch for multi modal models by merrymercy in https://github.com/sgl-project/sglang/pull/1702
* Add matched_stop token or str to distinguish between eos or stop str finish_reason generation by g-drozdov in https://github.com/sgl-project/sglang/pull/1684
* Fix regex and logprob conflicts when chunked prefilling by hnyls2002 in https://github.com/sgl-project/sglang/pull/1703
* Simplify flashinfer utilities by merrymercy in https://github.com/sgl-project/sglang/pull/1704
* Add dtype for more operations by merrymercy in https://github.com/sgl-project/sglang/pull/1705
* Add grouped free operations by merrymercy in https://github.com/sgl-project/sglang/pull/1706
* Skip unnecessary penalizer by merrymercy in https://github.com/sgl-project/sglang/pull/1707
* Simplify the nan detection and greedy check in sampler by merrymercy in https://github.com/sgl-project/sglang/pull/1709
* Fix `is_all_ready` for overlap copy by merrymercy in https://github.com/sgl-project/sglang/pull/1710
* Fix the race condition in overlap mode by merrymercy in https://github.com/sgl-project/sglang/pull/1712
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1713

0.3.3

* [Minor] Fix logging typo by amosyou in https://github.com/sgl-project/sglang/pull/1615
* Fix test_vision_openai_server on CI by ByronHsu in https://github.com/sgl-project/sglang/pull/1620
* [Performance, hardware] MoE tuning update to AMD MI300x GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1619
* Update README.md by kushal34712 in https://github.com/sgl-project/sglang/pull/1625
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1629
* Add device support by liangan1 in https://github.com/sgl-project/sglang/pull/1607
* Nit about the decorator of `PortArgs.init_new` by glen-amd in https://github.com/sgl-project/sglang/pull/1611
* [Bug] Fix the Image Input of Batch Generation by OBJECT907 in https://github.com/sgl-project/sglang/pull/1579
* Add the ability to enable and disable the Profiler via HTTP API. by abatom in https://github.com/sgl-project/sglang/pull/1626
* Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py by merrymercy in https://github.com/sgl-project/sglang/pull/1631
* Add image_token in conversation.py by merrymercy in https://github.com/sgl-project/sglang/pull/1632
* Added a "Back To Top" Button by JanumalaAkhilendra in https://github.com/sgl-project/sglang/pull/1633
* Fix constrained decoding by merrymercy in https://github.com/sgl-project/sglang/pull/1634
* Add back data parallelism by merrymercy in https://github.com/sgl-project/sglang/pull/1635

Page 3 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.