Vllm

Latest version: v0.6.4.post1

Safety actively analyzes 688007 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 8

0.2.3

Not secure

Major changes

* Refactoring on Worker, InputMetadata, and Attention
* Fix TP support for AWQ models
* Support Prometheus metrics
* Fix Baichuan & Baichuan 2

What's Changed
* Add instructions to install vllm+cu118 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1717
* Documentation about official docker image by simon-mo in https://github.com/vllm-project/vllm/pull/1709
* Fix the code block's format in deploying_with_docker page by HermitSun in https://github.com/vllm-project/vllm/pull/1722
* Migrate linter from `pylint` to `ruff` by simon-mo in https://github.com/vllm-project/vllm/pull/1665
* [FIX] Update the doc link in README.md by zhuohan123 in https://github.com/vllm-project/vllm/pull/1730
* [BugFix] Fix a bug in loading safetensors by WoosukKwon in https://github.com/vllm-project/vllm/pull/1732
* Fix hanging in the scheduler caused by long prompts by chenxu2048 in https://github.com/vllm-project/vllm/pull/1534
* [Fix] Fix bugs in scheduler by linotfan in https://github.com/vllm-project/vllm/pull/1727
* Rewrite torch.repeat_interleave to remove cpu synchronization by beginlner in https://github.com/vllm-project/vllm/pull/1599
* fix RAM OOM when load large models in tensor parallel mode. by boydfd in https://github.com/vllm-project/vllm/pull/1395
* [BugFix] Fix TP support for AWQ by WoosukKwon in https://github.com/vllm-project/vllm/pull/1731
* [FIX] Fix the case when `input_is_parallel=False` for `ScaledActivation` by zhuohan123 in https://github.com/vllm-project/vllm/pull/1737
* Add stop_token_ids in SamplingParams.__repr__ by chenxu2048 in https://github.com/vllm-project/vllm/pull/1745
* [DOCS] Add engine args documentation by casper-hansen in https://github.com/vllm-project/vllm/pull/1741
* Set top_p=0 and top_k=-1 in greedy sampling by beginlner in https://github.com/vllm-project/vllm/pull/1748
* Fix repetition penalty aligned with huggingface by beginlner in https://github.com/vllm-project/vllm/pull/1577
* [build] Avoid building too many extensions by ymwangg in https://github.com/vllm-project/vllm/pull/1624
* [Minor] Fix model docstrings by WoosukKwon in https://github.com/vllm-project/vllm/pull/1764
* Added echo function to OpenAI API server. by wanmok in https://github.com/vllm-project/vllm/pull/1504
* Init model on GPU to reduce CPU memory footprint by beginlner in https://github.com/vllm-project/vllm/pull/1796
* Correct comments in parallel_state.py by explainerauthors in https://github.com/vllm-project/vllm/pull/1818
* Fix OPT weight loading by WoosukKwon in https://github.com/vllm-project/vllm/pull/1819
* [FIX] Fix class naming by zhuohan123 in https://github.com/vllm-project/vllm/pull/1803
* Move the definition of BlockTable a few lines above so we could use it in BlockAllocator by explainerauthors in https://github.com/vllm-project/vllm/pull/1791
* [FIX] Fix formatting error in main branch by zhuohan123 in https://github.com/vllm-project/vllm/pull/1822
* [Fix] Fix RoPE in ChatGLM-32K by WoosukKwon in https://github.com/vllm-project/vllm/pull/1841
* Better integration with Ray Serve by FlorianJoncour in https://github.com/vllm-project/vllm/pull/1821
* Refactor Attention by WoosukKwon in https://github.com/vllm-project/vllm/pull/1840
* [Docs] Add information about using shared memory in docker by simon-mo in https://github.com/vllm-project/vllm/pull/1845
* Disable Logs Requests should Disable Logging of requests. by MichaelMcCulloch in https://github.com/vllm-project/vllm/pull/1779
* Refactor worker & InputMetadata by WoosukKwon in https://github.com/vllm-project/vllm/pull/1843
* Avoid multiple instantiations of the RoPE class by jeejeeli in https://github.com/vllm-project/vllm/pull/1828
* [FIX] Fix docker build error (1831) by allenhaozi in https://github.com/vllm-project/vllm/pull/1832
* Add profile option to latency benchmark by WoosukKwon in https://github.com/vllm-project/vllm/pull/1839
* Remove `max_num_seqs` in latency benchmark by WoosukKwon in https://github.com/vllm-project/vllm/pull/1855
* Support max-model-len argument for throughput benchmark by aisensiy in https://github.com/vllm-project/vllm/pull/1858
* Fix rope cache key error by esmeetu in https://github.com/vllm-project/vllm/pull/1867
* docs: add instructions for Langchain by mspronesti in https://github.com/vllm-project/vllm/pull/1162
* Support chat template and `echo` for chat API by Tostino in https://github.com/vllm-project/vllm/pull/1756
* Fix Baichuan tokenizer error by WoosukKwon in https://github.com/vllm-project/vllm/pull/1874
* Add weight normalization for Baichuan 2 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1876
* Fix the typo in SamplingParams' docstring. by xukp20 in https://github.com/vllm-project/vllm/pull/1886
* [Docs] Update the AWQ documentation to highlight performance issue by simon-mo in https://github.com/vllm-project/vllm/pull/1883
* Fix the broken sampler tests by WoosukKwon in https://github.com/vllm-project/vllm/pull/1896
* Add Production Metrics in Prometheus format by simon-mo in https://github.com/vllm-project/vllm/pull/1890
* Add PyTorch-native implementation of custom layers by WoosukKwon in https://github.com/vllm-project/vllm/pull/1898
* Fix broken worker test by WoosukKwon in https://github.com/vllm-project/vllm/pull/1900
* chore(examples-docs): upgrade to OpenAI V1 by mspronesti in https://github.com/vllm-project/vllm/pull/1785
* Fix num_gpus when TP > 1 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1852
* Bump up to v0.2.3 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1903

New Contributors
* boydfd made their first contribution in https://github.com/vllm-project/vllm/pull/1395
* explainerauthors made their first contribution in https://github.com/vllm-project/vllm/pull/1818
* FlorianJoncour made their first contribution in https://github.com/vllm-project/vllm/pull/1821
* MichaelMcCulloch made their first contribution in https://github.com/vllm-project/vllm/pull/1779
* jeejeeli made their first contribution in https://github.com/vllm-project/vllm/pull/1828
* allenhaozi made their first contribution in https://github.com/vllm-project/vllm/pull/1832
* aisensiy made their first contribution in https://github.com/vllm-project/vllm/pull/1858
* xukp20 made their first contribution in https://github.com/vllm-project/vllm/pull/1886

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.2.2...v0.2.3

0.2.2

Not secure

Major changes

* Bump up to PyTorch v2.1 + CUDA 12.1 ([vLLM+CUDA 11.8 is also provided](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#install-with-pip))
* Extensive refactoring for better tensor parallelism & quantization support
* New models: Yi, ChatGLM, Phi
* Changes in scheduler: from 1D flattened input tensor to 2D tensor
* AWQ support for all models
* Added LogitsProcessor API
* Preliminary support for SqueezeLLM

What's Changed
* Change scheduler & input tensor shape by WoosukKwon in https://github.com/vllm-project/vllm/pull/1381
* Add Mistral 7B to `test_models` by WoosukKwon in https://github.com/vllm-project/vllm/pull/1366
* fix typo by WrRan in https://github.com/vllm-project/vllm/pull/1383
* Fix TP bug by WoosukKwon in https://github.com/vllm-project/vllm/pull/1389
* Fix type hints by lxrite in https://github.com/vllm-project/vllm/pull/1427
* remove useless statements by WrRan in https://github.com/vllm-project/vllm/pull/1408
* Pin dependency versions by thiagosalvatore in https://github.com/vllm-project/vllm/pull/1429
* SqueezeLLM Support by chooper1 in https://github.com/vllm-project/vllm/pull/1326
* aquila model add rope_scaling by Sanster in https://github.com/vllm-project/vllm/pull/1457
* fix: don't skip first special token. by gesanqiu in https://github.com/vllm-project/vllm/pull/1497
* Support repetition_penalty by beginlner in https://github.com/vllm-project/vllm/pull/1424
* Fix bias in InternLM by WoosukKwon in https://github.com/vllm-project/vllm/pull/1501
* Delay GPU->CPU sync in sampling by Yard1 in https://github.com/vllm-project/vllm/pull/1337
* Refactor LLMEngine demo script for clarity and modularity by iongpt in https://github.com/vllm-project/vllm/pull/1413
* Fix logging issues by Tostino in https://github.com/vllm-project/vllm/pull/1494
* Add py.typed so consumers of vLLM can get type checking by jroesch in https://github.com/vllm-project/vllm/pull/1509
* vLLM always places spaces between special tokens by blahblahasdf in https://github.com/vllm-project/vllm/pull/1373
* [Fix] Fix duplicated logging messages by zhuohan123 in https://github.com/vllm-project/vllm/pull/1524
* Add dockerfile by skrider in https://github.com/vllm-project/vllm/pull/1350
* Fix integer overflows in attention & cache ops by WoosukKwon in https://github.com/vllm-project/vllm/pull/1514
* [Small] Formatter only checks lints in changed files by cadedaniel in https://github.com/vllm-project/vllm/pull/1528
* Add `MptForCausalLM` key in model_loader by wenfeiy-db in https://github.com/vllm-project/vllm/pull/1526
* [BugFix] Fix a bug when engine_use_ray=True and worker_use_ray=False and TP>1 by beginlner in https://github.com/vllm-project/vllm/pull/1531
* Adding a health endpoint by Fluder-Paradyne in https://github.com/vllm-project/vllm/pull/1540
* Remove `MPTConfig` by WoosukKwon in https://github.com/vllm-project/vllm/pull/1529
* Force paged attention v2 for long contexts by Yard1 in https://github.com/vllm-project/vllm/pull/1510
* docs: add description by lots-o in https://github.com/vllm-project/vllm/pull/1553
* Added logits processor API to sampling params by noamgat in https://github.com/vllm-project/vllm/pull/1469
* YaRN support implementation by Yard1 in https://github.com/vllm-project/vllm/pull/1264
* Add Quantization and AutoAWQ to docs by casper-hansen in https://github.com/vllm-project/vllm/pull/1235
* Support Yi model by esmeetu in https://github.com/vllm-project/vllm/pull/1567
* ChatGLM2 Support by GoHomeToMacDonal in https://github.com/vllm-project/vllm/pull/1261
* Upgrade to CUDA 12 by zhuohan123 in https://github.com/vllm-project/vllm/pull/1527
* [Worker] Fix input_metadata.selected_token_indices in worker by ymwangg in https://github.com/vllm-project/vllm/pull/1546
* Build CUDA11.8 wheels for release by WoosukKwon in https://github.com/vllm-project/vllm/pull/1596
* Add Yi model to quantization support by forpanyang in https://github.com/vllm-project/vllm/pull/1600
* Dockerfile: Upgrade Cuda to 12.1 by GhaziSyed in https://github.com/vllm-project/vllm/pull/1609
* config parser: add ChatGLM2 seq_length to `_get_and_verify_max_len` by irasin in https://github.com/vllm-project/vllm/pull/1617
* Fix cpu heavy code in async function _AsyncLLMEngine._run_workers_async by dominik-schwabe in https://github.com/vllm-project/vllm/pull/1628
* Fix 1474 - gptj AssertionError : assert param_slice.shape == loaded_weight.shape by lihuahua123 in https://github.com/vllm-project/vllm/pull/1631
* [Minor] Move RoPE selection logic to `get_rope` by WoosukKwon in https://github.com/vllm-project/vllm/pull/1633
* Add DeepSpeed MII backend to benchmark script by WoosukKwon in https://github.com/vllm-project/vllm/pull/1649
* TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models by zhuohan123 in https://github.com/vllm-project/vllm/pull/1622
* Remove `MptConfig` by megha95 in https://github.com/vllm-project/vllm/pull/1668
* feat(config): support parsing torch.dtype by aarnphm in https://github.com/vllm-project/vllm/pull/1641
* Fix loading error when safetensors contains empty tensor by twaka in https://github.com/vllm-project/vllm/pull/1687
* [Minor] Fix duplication of ignored seq group in engine step by simon-mo in https://github.com/vllm-project/vllm/pull/1666
* [models] Microsoft Phi 1.5 by maximzubkov in https://github.com/vllm-project/vllm/pull/1664
* [Fix] Update Supported Models List by zhuohan123 in https://github.com/vllm-project/vllm/pull/1690
* Return usage for openai requests by ichernev in https://github.com/vllm-project/vllm/pull/1663
* [Fix] Fix comm test by zhuohan123 in https://github.com/vllm-project/vllm/pull/1691
* Update the adding-model doc according to the new refactor by zhuohan123 in https://github.com/vllm-project/vllm/pull/1692
* Add 'not' to this annotation: "FIXME(woosuk): Do not use internal method" by linotfan in https://github.com/vllm-project/vllm/pull/1704
* Support Min P Sampler by esmeetu in https://github.com/vllm-project/vllm/pull/1642
* Read quantization_config in hf config by WoosukKwon in https://github.com/vllm-project/vllm/pull/1695
* Support download models from www.modelscope.cn by liuyhwangyh in https://github.com/vllm-project/vllm/pull/1588
* follow up of 1687 when safetensors model contains 0-rank tensors by twaka in https://github.com/vllm-project/vllm/pull/1696
* Add AWQ support for all models by WoosukKwon in https://github.com/vllm-project/vllm/pull/1714
* Support fused add rmsnorm for LLaMA by beginlner in https://github.com/vllm-project/vllm/pull/1667
* [Fix] Fix warning msg on quantization by WoosukKwon in https://github.com/vllm-project/vllm/pull/1715
* Bump up the version to v0.2.2 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1689

New Contributors
* lxrite made their first contribution in https://github.com/vllm-project/vllm/pull/1427
* thiagosalvatore made their first contribution in https://github.com/vllm-project/vllm/pull/1429
* chooper1 made their first contribution in https://github.com/vllm-project/vllm/pull/1326
* beginlner made their first contribution in https://github.com/vllm-project/vllm/pull/1424
* iongpt made their first contribution in https://github.com/vllm-project/vllm/pull/1413
* Tostino made their first contribution in https://github.com/vllm-project/vllm/pull/1494
* jroesch made their first contribution in https://github.com/vllm-project/vllm/pull/1509
* skrider made their first contribution in https://github.com/vllm-project/vllm/pull/1350
* cadedaniel made their first contribution in https://github.com/vllm-project/vllm/pull/1528
* wenfeiy-db made their first contribution in https://github.com/vllm-project/vllm/pull/1526
* Fluder-Paradyne made their first contribution in https://github.com/vllm-project/vllm/pull/1540
* lots-o made their first contribution in https://github.com/vllm-project/vllm/pull/1553
* noamgat made their first contribution in https://github.com/vllm-project/vllm/pull/1469
* casper-hansen made their first contribution in https://github.com/vllm-project/vllm/pull/1235
* GoHomeToMacDonal made their first contribution in https://github.com/vllm-project/vllm/pull/1261
* ymwangg made their first contribution in https://github.com/vllm-project/vllm/pull/1546
* forpanyang made their first contribution in https://github.com/vllm-project/vllm/pull/1600
* GhaziSyed made their first contribution in https://github.com/vllm-project/vllm/pull/1609
* irasin made their first contribution in https://github.com/vllm-project/vllm/pull/1617
* dominik-schwabe made their first contribution in https://github.com/vllm-project/vllm/pull/1628
* lihuahua123 made their first contribution in https://github.com/vllm-project/vllm/pull/1631
* megha95 made their first contribution in https://github.com/vllm-project/vllm/pull/1668
* aarnphm made their first contribution in https://github.com/vllm-project/vllm/pull/1641
* simon-mo made their first contribution in https://github.com/vllm-project/vllm/pull/1666
* maximzubkov made their first contribution in https://github.com/vllm-project/vllm/pull/1664
* ichernev made their first contribution in https://github.com/vllm-project/vllm/pull/1663
* linotfan made their first contribution in https://github.com/vllm-project/vllm/pull/1704
* liuyhwangyh made their first contribution in https://github.com/vllm-project/vllm/pull/1588

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.2.1...v0.2.2

0.2.1.post1

Not secure

This is an emergency release to fix a bug on tensor parallelism support.

0.2.1

Not secure

Major Changes

* PagedAttention V2 kernel: Up to 20% end-to-end latency reduction
* Support log probabilities for prompt tokens
* AWQ support for Mistral 7B

What's Changed
* fixing typo in `tiiuae/falcon-rw-7b` model name by 0ssamaak0 in https://github.com/vllm-project/vllm/pull/1226
* Added `dtype` arg to benchmarks by kg6-sleipnir in https://github.com/vllm-project/vllm/pull/1228
* fix vulnerable memory modification to gpu shared memory by soundOfDestiny in https://github.com/vllm-project/vllm/pull/1241
* support sharding llama2-70b on more than 8 GPUs by zhuohan123 in https://github.com/vllm-project/vllm/pull/1209
* [Minor] Fix type annotations by WoosukKwon in https://github.com/vllm-project/vllm/pull/1238
* TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic by zhuohan123 in https://github.com/vllm-project/vllm/pull/1181
* add support for tokenizer revision by cassanof in https://github.com/vllm-project/vllm/pull/1163
* Use monotonic time where appropriate by Yard1 in https://github.com/vllm-project/vllm/pull/1249
* API server support ipv4 / ipv6 dualstack by yunfeng-scale in https://github.com/vllm-project/vllm/pull/1288
* Move bfloat16 check to worker by Yard1 in https://github.com/vllm-project/vllm/pull/1259
* [FIX] Explain why the finished_reason of ignored sequences are length by zhuohan123 in https://github.com/vllm-project/vllm/pull/1289
* Update README.md by zhuohan123 in https://github.com/vllm-project/vllm/pull/1292
* [Minor] Fix comment in mistral.py by zhuohan123 in https://github.com/vllm-project/vllm/pull/1303
* lock torch version to 2.0.1 when build for 1283 by yanxiyue in https://github.com/vllm-project/vllm/pull/1290
* minor update by WrRan in https://github.com/vllm-project/vllm/pull/1311
* change the timing of sorting logits by yhlskt23 in https://github.com/vllm-project/vllm/pull/1309
* workaround of AWQ for Turing GPUs by twaka in https://github.com/vllm-project/vllm/pull/1252
* Fix overflow in awq kernel by chu-tianxiang in https://github.com/vllm-project/vllm/pull/1295
* Update model_loader.py by AmaleshV in https://github.com/vllm-project/vllm/pull/1278
* Add blacklist for model checkpoint by WoosukKwon in https://github.com/vllm-project/vllm/pull/1325
* Update README.md Aquila2. by ftgreat in https://github.com/vllm-project/vllm/pull/1331
* Improve detokenization performance by Yard1 in https://github.com/vllm-project/vllm/pull/1338
* Bump up transformers version & Remove MistralConfig by WoosukKwon in https://github.com/vllm-project/vllm/pull/1254
* Fix the issue for AquilaChat2-* models by lu-wang-dl in https://github.com/vllm-project/vllm/pull/1339
* Fix error message on `TORCH_CUDA_ARCH_LIST` by WoosukKwon in https://github.com/vllm-project/vllm/pull/1239
* Minor fix on AWQ kernel launch by WoosukKwon in https://github.com/vllm-project/vllm/pull/1356
* Implement PagedAttention V2 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1348
* Implement prompt logprobs & Batched topk for computing logprobs by zhuohan123 in https://github.com/vllm-project/vllm/pull/1328
* Fix PyTorch version to 2.0.1 in workflow by WoosukKwon in https://github.com/vllm-project/vllm/pull/1377
* Fix PyTorch index URL in workflow by WoosukKwon in https://github.com/vllm-project/vllm/pull/1378
* Fix sampler test by WoosukKwon in https://github.com/vllm-project/vllm/pull/1379
* Bump up the version to v0.2.1 by zhuohan123 in https://github.com/vllm-project/vllm/pull/1355

New Contributors
* 0ssamaak0 made their first contribution in https://github.com/vllm-project/vllm/pull/1226
* kg6-sleipnir made their first contribution in https://github.com/vllm-project/vllm/pull/1228
* soundOfDestiny made their first contribution in https://github.com/vllm-project/vllm/pull/1241
* cassanof made their first contribution in https://github.com/vllm-project/vllm/pull/1163
* yunfeng-scale made their first contribution in https://github.com/vllm-project/vllm/pull/1288
* yanxiyue made their first contribution in https://github.com/vllm-project/vllm/pull/1290
* yhlskt23 made their first contribution in https://github.com/vllm-project/vllm/pull/1309
* chu-tianxiang made their first contribution in https://github.com/vllm-project/vllm/pull/1295
* AmaleshV made their first contribution in https://github.com/vllm-project/vllm/pull/1278
* lu-wang-dl made their first contribution in https://github.com/vllm-project/vllm/pull/1339

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.2.0...v0.2.1

0.2.0

Not secure

Major changes

* Up to 60% performance improvement by optimizing de-tokenization and sampler
* Initial support for AWQ (performance not optimized)
* Support for RoPE scaling and LongChat
* Support for Mistral-7B
* Many bug fixes

What's Changed
* add option to shorten prompt print in log by leiwen83 in https://github.com/vllm-project/vllm/pull/991
* Make `max_model_len` configurable by Yard1 in https://github.com/vllm-project/vllm/pull/972
* Fix typo in README.md by eltociear in https://github.com/vllm-project/vllm/pull/1033
* Use TGI-like incremental detokenization by Yard1 in https://github.com/vllm-project/vllm/pull/984
* Add Model Revision Support in https://github.com/vllm-project/vllm/pull/1014
* [FIX] Minor bug fixes by zhuohan123 in https://github.com/vllm-project/vllm/pull/1035
* Announce paper release by WoosukKwon in https://github.com/vllm-project/vllm/pull/1036
* Fix detokenization leaving special tokens by Yard1 in https://github.com/vllm-project/vllm/pull/1044
* Add pandas to requirements.txt by WoosukKwon in https://github.com/vllm-project/vllm/pull/1047
* OpenAI-Server: Only fail if logit_bias has actual values by LLukas22 in https://github.com/vllm-project/vllm/pull/1045
* Fix warning message on LLaMA FastTokenizer by WoosukKwon in https://github.com/vllm-project/vllm/pull/1037
* Abort when coroutine is cancelled by rucyang in https://github.com/vllm-project/vllm/pull/1020
* Implement AWQ quantization support for LLaMA by WoosukKwon in https://github.com/vllm-project/vllm/pull/1032
* Remove AsyncLLMEngine busy loop, shield background task by Yard1 in https://github.com/vllm-project/vllm/pull/1059
* Fix hanging when prompt exceeds limit by chenxu2048 in https://github.com/vllm-project/vllm/pull/1029
* [FIX] Don't initialize parameter by default by zhuohan123 in https://github.com/vllm-project/vllm/pull/1067
* added support for quantize on LLM module by orellavie1212 in https://github.com/vllm-project/vllm/pull/1080
* align llm_engine and async_engine step method. by esmeetu in https://github.com/vllm-project/vllm/pull/1081
* Fix get_max_num_running_seqs for waiting and swapped seq groups by zhuohan123 in https://github.com/vllm-project/vllm/pull/1068
* Add safetensors support for quantized models by WoosukKwon in https://github.com/vllm-project/vllm/pull/1073
* Add minimum capability requirement for AWQ by WoosukKwon in https://github.com/vllm-project/vllm/pull/1064
* [Community] Add vLLM Discord server by zhuohan123 in https://github.com/vllm-project/vllm/pull/1086
* Add pyarrow to dependencies & Print warning on Ray import error by WoosukKwon in https://github.com/vllm-project/vllm/pull/1094
* Add gpu_memory_utilization and swap_space to LLM by WoosukKwon in https://github.com/vllm-project/vllm/pull/1090
* Add documentation to Triton server tutorial by tanmayv25 in https://github.com/vllm-project/vllm/pull/983
* rope_theta and max_position_embeddings from config by Yard1 in https://github.com/vllm-project/vllm/pull/1096
* Replace torch.cuda.DtypeTensor with torch.tensor by WoosukKwon in https://github.com/vllm-project/vllm/pull/1123
* Add float16 and float32 to dtype choices by WoosukKwon in https://github.com/vllm-project/vllm/pull/1115
* clean api code, remove redundant background task. by esmeetu in https://github.com/vllm-project/vllm/pull/1102
* feat: support stop_token_ids parameter. by gesanqiu in https://github.com/vllm-project/vllm/pull/1097
* Use `--ipc=host` in `docker run` for distributed inference by WoosukKwon in https://github.com/vllm-project/vllm/pull/1125
* Docs: Fix broken link to openai example by nkpz in https://github.com/vllm-project/vllm/pull/1145
* Announce the First vLLM Meetup by WoosukKwon in https://github.com/vllm-project/vllm/pull/1148
* [Sampler] Vectorized sampling (simplified) by zhuohan123 in https://github.com/vllm-project/vllm/pull/1048
* [FIX] Simplify sampler logic by zhuohan123 in https://github.com/vllm-project/vllm/pull/1156
* Fix config for Falcon by WoosukKwon in https://github.com/vllm-project/vllm/pull/1164
* Align `max_tokens` behavior with openai by HermitSun in https://github.com/vllm-project/vllm/pull/852
* [Setup] Enable `TORCH_CUDA_ARCH_LIST` for selecting target GPUs by WoosukKwon in https://github.com/vllm-project/vllm/pull/1074
* Add comments on RoPE initialization by WoosukKwon in https://github.com/vllm-project/vllm/pull/1176
* Allocate more shared memory to attention kernel by Yard1 in https://github.com/vllm-project/vllm/pull/1154
* Support Longchat by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/555
* fix typo (?) by WrRan in https://github.com/vllm-project/vllm/pull/1184
* fix qwen-14b model by Sanster in https://github.com/vllm-project/vllm/pull/1173
* Automatically set `max_num_batched_tokens` by WoosukKwon in https://github.com/vllm-project/vllm/pull/1198
* Use standard extras for `uvicorn` by danilopeixoto in https://github.com/vllm-project/vllm/pull/1166
* Keep special sampling params by blahblahasdf in https://github.com/vllm-project/vllm/pull/1186
* qwen add rope_scaling by Sanster in https://github.com/vllm-project/vllm/pull/1210
* [Mistral] Mistral-7B-v0.1 support by Bam4d in https://github.com/vllm-project/vllm/pull/1196
* Fix Mistral model by WoosukKwon in https://github.com/vllm-project/vllm/pull/1220
* [Fix] Remove false assertion by WoosukKwon in https://github.com/vllm-project/vllm/pull/1222
* Add Mistral to supported model list by WoosukKwon in https://github.com/vllm-project/vllm/pull/1221
* Fix OOM in attention kernel test by WoosukKwon in https://github.com/vllm-project/vllm/pull/1223
* Provide default max model length by WoosukKwon in https://github.com/vllm-project/vllm/pull/1224
* Bump up the version to v0.2.0 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1212

New Contributors
* leiwen83 made their first contribution in https://github.com/vllm-project/vllm/pull/991
* LLukas22 made their first contribution in https://github.com/vllm-project/vllm/pull/1045
* rucyang made their first contribution in https://github.com/vllm-project/vllm/pull/1020
* chenxu2048 made their first contribution in https://github.com/vllm-project/vllm/pull/1029
* orellavie1212 made their first contribution in https://github.com/vllm-project/vllm/pull/1080
* tanmayv25 made their first contribution in https://github.com/vllm-project/vllm/pull/983
* nkpz made their first contribution in https://github.com/vllm-project/vllm/pull/1145
* WrRan made their first contribution in https://github.com/vllm-project/vllm/pull/1184
* danilopeixoto made their first contribution in https://github.com/vllm-project/vllm/pull/1166
* blahblahasdf made their first contribution in https://github.com/vllm-project/vllm/pull/1186
* Bam4d made their first contribution in https://github.com/vllm-project/vllm/pull/1196

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.7...v0.2.0

0.1.7

Not secure

A minor release to fix the bugs in ALiBi, Falcon-40B, and Code Llama.

What's Changed
* fix "tansformers_module" ModuleNotFoundError when load model with `trust_remote_code=True` by Jingru in https://github.com/vllm-project/vllm/pull/871
* Fix wrong dtype in PagedAttentionWithALiBi bias by Yard1 in https://github.com/vllm-project/vllm/pull/996
* fix: CUDA error when inferencing with Falcon-40B base model by kyujin-cho in https://github.com/vllm-project/vllm/pull/992
* [Docs] Update installation page by WoosukKwon in https://github.com/vllm-project/vllm/pull/1005
* Update setup.py by WoosukKwon in https://github.com/vllm-project/vllm/pull/1006
* Use FP32 in RoPE initialization by WoosukKwon in https://github.com/vllm-project/vllm/pull/1004
* Bump up the version to v0.1.7 by WoosukKwon in https://github.com/vllm-project/vllm/pull/1013

New Contributors
* Jingru made their first contribution in https://github.com/vllm-project/vllm/pull/871
* kyujin-cho made their first contribution in https://github.com/vllm-project/vllm/pull/992

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.6...v0.1.7

Page 6 of 8

Releases

Has known vulnerabilities

Previous Next

Vllm

Page 6 of 8

0.2.3

0.2.2

0.2.1.post1

0.2.1

0.2.0

0.1.7

Page 6 of 8

Links

Releases