Major changes
* Refactoring on Worker, InputMetadata, and Attention
* Fix TP support for AWQ models
* Support Prometheus metrics
* Fix Baichuan & Baichuan 2
What's Changed
* Add instructions to install vllm+cu118 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1717
* Documentation about official docker image by simon-mo in https://github.com/vllm-project/vllm/pull/1709
* Fix the code block's format in deploying_with_docker page by HermitSun in https://github.com/vllm-project/vllm/pull/1722
* Migrate linter from `pylint` to `ruff` by simon-mo in https://github.com/vllm-project/vllm/pull/1665
* [FIX] Update the doc link in README.md by zhuohan123 in https://github.com/vllm-project/vllm/pull/1730
* [BugFix] Fix a bug in loading safetensors by WoosukKwon in https://github.com/vllm-project/vllm/pull/1732
* Fix hanging in the scheduler caused by long prompts by chenxu2048 in https://github.com/vllm-project/vllm/pull/1534
* [Fix] Fix bugs in scheduler by linotfan in https://github.com/vllm-project/vllm/pull/1727
* Rewrite torch.repeat_interleave to remove cpu synchronization by beginlner in https://github.com/vllm-project/vllm/pull/1599
* fix RAM OOM when load large models in tensor parallel mode. by boydfd in https://github.com/vllm-project/vllm/pull/1395
* [BugFix] Fix TP support for AWQ by WoosukKwon in https://github.com/vllm-project/vllm/pull/1731
* [FIX] Fix the case when `input_is_parallel=False` for `ScaledActivation` by zhuohan123 in https://github.com/vllm-project/vllm/pull/1737
* Add stop_token_ids in SamplingParams.__repr__ by chenxu2048 in https://github.com/vllm-project/vllm/pull/1745
* [DOCS] Add engine args documentation by casper-hansen in https://github.com/vllm-project/vllm/pull/1741
* Set top_p=0 and top_k=-1 in greedy sampling by beginlner in https://github.com/vllm-project/vllm/pull/1748
* Fix repetition penalty aligned with huggingface by beginlner in https://github.com/vllm-project/vllm/pull/1577
* [build] Avoid building too many extensions by ymwangg in https://github.com/vllm-project/vllm/pull/1624
* [Minor] Fix model docstrings by WoosukKwon in https://github.com/vllm-project/vllm/pull/1764
* Added echo function to OpenAI API server. by wanmok in https://github.com/vllm-project/vllm/pull/1504
* Init model on GPU to reduce CPU memory footprint by beginlner in https://github.com/vllm-project/vllm/pull/1796
* Correct comments in parallel_state.py by explainerauthors in https://github.com/vllm-project/vllm/pull/1818
* Fix OPT weight loading by WoosukKwon in https://github.com/vllm-project/vllm/pull/1819
* [FIX] Fix class naming by zhuohan123 in https://github.com/vllm-project/vllm/pull/1803
* Move the definition of BlockTable a few lines above so we could use it in BlockAllocator by explainerauthors in https://github.com/vllm-project/vllm/pull/1791
* [FIX] Fix formatting error in main branch by zhuohan123 in https://github.com/vllm-project/vllm/pull/1822
* [Fix] Fix RoPE in ChatGLM-32K by WoosukKwon in https://github.com/vllm-project/vllm/pull/1841
* Better integration with Ray Serve by FlorianJoncour in https://github.com/vllm-project/vllm/pull/1821
* Refactor Attention by WoosukKwon in https://github.com/vllm-project/vllm/pull/1840
* [Docs] Add information about using shared memory in docker by simon-mo in https://github.com/vllm-project/vllm/pull/1845
* Disable Logs Requests should Disable Logging of requests. by MichaelMcCulloch in https://github.com/vllm-project/vllm/pull/1779
* Refactor worker & InputMetadata by WoosukKwon in https://github.com/vllm-project/vllm/pull/1843
* Avoid multiple instantiations of the RoPE class by jeejeeli in https://github.com/vllm-project/vllm/pull/1828
* [FIX] Fix docker build error (1831) by allenhaozi in https://github.com/vllm-project/vllm/pull/1832
* Add profile option to latency benchmark by WoosukKwon in https://github.com/vllm-project/vllm/pull/1839
* Remove `max_num_seqs` in latency benchmark by WoosukKwon in https://github.com/vllm-project/vllm/pull/1855
* Support max-model-len argument for throughput benchmark by aisensiy in https://github.com/vllm-project/vllm/pull/1858
* Fix rope cache key error by esmeetu in https://github.com/vllm-project/vllm/pull/1867
* docs: add instructions for Langchain by mspronesti in https://github.com/vllm-project/vllm/pull/1162
* Support chat template and `echo` for chat API by Tostino in https://github.com/vllm-project/vllm/pull/1756
* Fix Baichuan tokenizer error by WoosukKwon in https://github.com/vllm-project/vllm/pull/1874
* Add weight normalization for Baichuan 2 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1876
* Fix the typo in SamplingParams' docstring. by xukp20 in https://github.com/vllm-project/vllm/pull/1886
* [Docs] Update the AWQ documentation to highlight performance issue by simon-mo in https://github.com/vllm-project/vllm/pull/1883
* Fix the broken sampler tests by WoosukKwon in https://github.com/vllm-project/vllm/pull/1896
* Add Production Metrics in Prometheus format by simon-mo in https://github.com/vllm-project/vllm/pull/1890
* Add PyTorch-native implementation of custom layers by WoosukKwon in https://github.com/vllm-project/vllm/pull/1898
* Fix broken worker test by WoosukKwon in https://github.com/vllm-project/vllm/pull/1900
* chore(examples-docs): upgrade to OpenAI V1 by mspronesti in https://github.com/vllm-project/vllm/pull/1785
* Fix num_gpus when TP > 1 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1852
* Bump up to v0.2.3 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1903
New Contributors
* boydfd made their first contribution in https://github.com/vllm-project/vllm/pull/1395
* explainerauthors made their first contribution in https://github.com/vllm-project/vllm/pull/1818
* FlorianJoncour made their first contribution in https://github.com/vllm-project/vllm/pull/1821
* MichaelMcCulloch made their first contribution in https://github.com/vllm-project/vllm/pull/1779
* jeejeeli made their first contribution in https://github.com/vllm-project/vllm/pull/1828
* allenhaozi made their first contribution in https://github.com/vllm-project/vllm/pull/1832
* aisensiy made their first contribution in https://github.com/vllm-project/vllm/pull/1858
* xukp20 made their first contribution in https://github.com/vllm-project/vllm/pull/1886
**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.2.2...v0.2.3