* [Performance] Support both xgrammar and outlines for constrained decoding by DarkSharpness in https://github.com/sgl-project/sglang/pull/1752
* [Fix] Fix --skip-tokenizer-init by merrymercy in https://github.com/sgl-project/sglang/pull/1798
* move max_position_embeddings to the last by hliuca in https://github.com/sgl-project/sglang/pull/1799
* add support for ipynb by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1786
* Fix possible ZMQ hanging by hnyls2002 in https://github.com/sgl-project/sglang/pull/1800
* Set `ZMQ` buffer size heuristic by hnyls2002 in https://github.com/sgl-project/sglang/pull/1801
* Allow consecutive ports when launching multiple sglang servers. by hnyls2002 in https://github.com/sgl-project/sglang/pull/1802
* fix int conversion for `SGLANG_CPU_COUNT` by ByronHsu in https://github.com/sgl-project/sglang/pull/1803
* Update ci workflows by merrymercy in https://github.com/sgl-project/sglang/pull/1804
* Update links by merrymercy in https://github.com/sgl-project/sglang/pull/1805
* Simplify our docs with complicated functions into utils by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1807
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1808
* Provide an argument to set the maximum batch size for cuda graph by merrymercy in https://github.com/sgl-project/sglang/pull/1809
* Improve the user control of new_token_ratio by merrymercy in https://github.com/sgl-project/sglang/pull/1811
* Update hyperparameter_tuning.md by merrymercy in https://github.com/sgl-project/sglang/pull/1813
* Add a watch dog thread by merrymercy in https://github.com/sgl-project/sglang/pull/1816
* Fix unit tests by merrymercy in https://github.com/sgl-project/sglang/pull/1817
* Add openAI compatible API by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1810
* Fix Triton decode kernel & ut by ispobock in https://github.com/sgl-project/sglang/pull/1819
* support token ids in `engine.generate` by ByronHsu in https://github.com/sgl-project/sglang/pull/1820
* Fix docs deploy ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1821
* [router] rust-based router by ByronHsu in https://github.com/sgl-project/sglang/pull/1790
* Fix update_weights deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1825
* fix get_memory_pool_size deadlock for DP by ByronHsu in https://github.com/sgl-project/sglang/pull/1830
* Support setting `use_thread` in the `run_program` for easier debugging. by liuyanyi in https://github.com/sgl-project/sglang/pull/1823
* [3rdparty, document] Add 3rdparty/amd, with profiling and tuning instructions to be added by HaiShaw in https://github.com/sgl-project/sglang/pull/1822
* stop_str of qwen2-vl template should be a tuple not a str by yizhang2077 in https://github.com/sgl-project/sglang/pull/1834
* [FP8 KV Cache, Mixtral] Avoid KeyError at loading pre-quantized FP8 m… by HaiShaw in https://github.com/sgl-project/sglang/pull/1835
* Gpt2 by DanielC12321 in https://github.com/sgl-project/sglang/pull/1833
* Imporve openai api documents by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1827
* Update docs by merrymercy in https://github.com/sgl-project/sglang/pull/1839
* Update README.md by merrymercy in https://github.com/sgl-project/sglang/pull/1840
* [Production] Drain requests before exit when receive SIGTERM by Ying1123 in https://github.com/sgl-project/sglang/pull/1838
* [Performance, Hardware] MoE weights padding to AMD MI300x GPUs by HaiShaw in https://github.com/sgl-project/sglang/pull/1836
* Fix suggest edit by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1842
* [Performance, Triton Kernel Args] _decode_grouped_softmax_reducev_fwd… by HaiShaw in https://github.com/sgl-project/sglang/pull/1845
* Make decode log interval configurable by ByronHsu in https://github.com/sgl-project/sglang/pull/1847
* Fix mixed chunked prefill by merrymercy in https://github.com/sgl-project/sglang/pull/1850
* Refactor tokenizer manager by ByronHsu in https://github.com/sgl-project/sglang/pull/1846
* Simplify documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1851
* Fix warnings in doc build by merrymercy in https://github.com/sgl-project/sglang/pull/1852
* delete unused character by geeker-smallwhite in https://github.com/sgl-project/sglang/pull/1855
* Fix memory leak for chunked prefill 2 by merrymercy in https://github.com/sgl-project/sglang/pull/1858
* [Build, ROCm] Dockerfile.rocm for Instinct GPUs, with package updates by HaiShaw in https://github.com/sgl-project/sglang/pull/1861
* Fix retraction + overlap by hnyls2002 in https://github.com/sgl-project/sglang/pull/1860
* change file tree by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1859
* Update vocab embedding deps and add TP switch by ispobock in https://github.com/sgl-project/sglang/pull/1856
* minor: add human eval by zhyncs in https://github.com/sgl-project/sglang/pull/1754
* Add vlm document by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1866
* minor: update nightly eval by zhyncs in https://github.com/sgl-project/sglang/pull/1867
* [3rdparty, document] Updated Documentation that covers performance tuning techniques for AMD Instinct GPUs. by yichiche in https://github.com/sgl-project/sglang/pull/1871
* Improve docs and fix the broken links by merrymercy in https://github.com/sgl-project/sglang/pull/1875
* Add a FAQ documentation by merrymercy in https://github.com/sgl-project/sglang/pull/1877
* Update docs title by merrymercy in https://github.com/sgl-project/sglang/pull/1879
* Update docs and workflow by merrymercy in https://github.com/sgl-project/sglang/pull/1881
* Fix doc links by merrymercy in https://github.com/sgl-project/sglang/pull/1882
* Fix incorrect context length for llama3.2-11b by rchen19 in https://github.com/sgl-project/sglang/pull/1873
* add native api docs by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1883
* Update index.rst to improve the order of docs by merrymercy in https://github.com/sgl-project/sglang/pull/1885
* Native api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1886
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1889
* Fix docs ci by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1888
* Fix docs by merrymercy in https://github.com/sgl-project/sglang/pull/1890
* Fix ci and link error by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1892
* Add engine api by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1894
* turn off log for the offline engine by zhaochenyang20 in https://github.com/sgl-project/sglang/pull/1895
* Do not use longest prefix matching when queue-req is large by merrymercy in https://github.com/sgl-project/sglang/pull/1896
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1899
* Allow passing dtype and max_new_tokens to HF reference script by janimo in https://github.com/sgl-project/sglang/pull/1903
* Simplify tokenizer manager by merrymercy in https://github.com/sgl-project/sglang/pull/1904
* Unify the model type checking by merrymercy in https://github.com/sgl-project/sglang/pull/1905
* Escape backwards slash by inakineitor in https://github.com/sgl-project/sglang/pull/1902
* feat: support truss endpoint for benchmark serving by zhyncs in https://github.com/sgl-project/sglang/pull/1906
* Let reward model take text inputs instead of message lists by merrymercy in https://github.com/sgl-project/sglang/pull/1907