Vllm

Latest version: v0.6.4.post1

Safety actively analyzes 688007 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 8

0.1.6

Not secure

**Note:** This is an emergency release to revert a breaking API change that can make many existing codes using AsyncLLMServer not work.

What's Changed
* faster startup of vLLM by ri938 in https://github.com/vllm-project/vllm/pull/982
* Start background task in `AsyncLLMEngine.generate` by Yard1 in https://github.com/vllm-project/vllm/pull/988
* Bump up the version to v0.1.6 by zhuohan123 in https://github.com/vllm-project/vllm/pull/989

New Contributors
* ri938 made their first contribution in https://github.com/vllm-project/vllm/pull/982

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.5...v0.1.6

0.1.5

Not secure

Major Changes
* Align beam search with `hf_model.generate`.
* Stablelize AsyncLLMEngine with a background engine loop.
* Add support for CodeLLaMA.
* Add many model correctness tests.
* Many other correctness fixes.

What's Changed
* Add support for CodeLlama by Yard1 in https://github.com/vllm-project/vllm/pull/854
* [Fix] Fix a condition for ignored sequences by zhuohan123 in https://github.com/vllm-project/vllm/pull/867
* use flash-attn via xformers by tmm1 in https://github.com/vllm-project/vllm/pull/877
* Enable request body OpenAPI spec for OpenAI endpoints by Peilun-Li in https://github.com/vllm-project/vllm/pull/865
* Accelerate LLaMA model loading by JF-D in https://github.com/vllm-project/vllm/pull/234
* Improve _prune_hidden_states micro-benchmark by tmm1 in https://github.com/vllm-project/vllm/pull/707
* fix: bug fix when penalties are negative by pfldy2850 in https://github.com/vllm-project/vllm/pull/913
* [Docs] Minor fixes in supported models by WoosukKwon in https://github.com/vllm-project/vllm/pull/920
* Fix README.md Link by zhuohan123 in https://github.com/vllm-project/vllm/pull/927
* Add tests for models by WoosukKwon in https://github.com/vllm-project/vllm/pull/922
* Avoid compiling kernels for double data type by WoosukKwon in https://github.com/vllm-project/vllm/pull/933
* [BugFix] Fix NaN errors in paged attention kernel by WoosukKwon in https://github.com/vllm-project/vllm/pull/936
* Refactor AsyncLLMEngine by Yard1 in https://github.com/vllm-project/vllm/pull/880
* Only emit warning about internal tokenizer if it isn't being used by nelson-liu in https://github.com/vllm-project/vllm/pull/939
* Align vLLM's beam search implementation with HF generate by zhuohan123 in https://github.com/vllm-project/vllm/pull/857
* Initialize AsyncLLMEngine bg loop correctly by Yard1 in https://github.com/vllm-project/vllm/pull/943
* FIx vLLM cannot launch by HermitSun in https://github.com/vllm-project/vllm/pull/948
* Clean up kernel unit tests by WoosukKwon in https://github.com/vllm-project/vllm/pull/938
* Use queue for finished requests by Yard1 in https://github.com/vllm-project/vllm/pull/957
* [BugFix] Implement RoPE for GPT-J by WoosukKwon in https://github.com/vllm-project/vllm/pull/941
* Set torch default dtype in a context manager by Yard1 in https://github.com/vllm-project/vllm/pull/971
* Bump up transformers version in requirements.txt by WoosukKwon in https://github.com/vllm-project/vllm/pull/976
* Make `AsyncLLMEngine` more robust & fix batched abort by Yard1 in https://github.com/vllm-project/vllm/pull/969
* Enable safetensors loading for all models by zhuohan123 in https://github.com/vllm-project/vllm/pull/974
* [FIX] Fix Alibi implementation in PagedAttention kernel by zhuohan123 in https://github.com/vllm-project/vllm/pull/945
* Bump up the version to v0.1.5 by WoosukKwon in https://github.com/vllm-project/vllm/pull/944

New Contributors
* tmm1 made their first contribution in https://github.com/vllm-project/vllm/pull/877
* Peilun-Li made their first contribution in https://github.com/vllm-project/vllm/pull/865
* JF-D made their first contribution in https://github.com/vllm-project/vllm/pull/234
* pfldy2850 made their first contribution in https://github.com/vllm-project/vllm/pull/913
* nelson-liu made their first contribution in https://github.com/vllm-project/vllm/pull/939

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.4...v0.1.5

0.1.4

Not secure

Major changes

* From now on, vLLM is published with pre-built CUDA binaries. Users don't have to compile the vLLM's CUDA kernels on their machine.
* New models: InternLM, Qwen, Aquila.
* Optimizing CUDA kernels for paged attention and GELU.
* Many bug fixes.

What's Changed
* Fix gibberish outputs of GPT-BigCode-based models by HermitSun in https://github.com/vllm-project/vllm/pull/676
* [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel by naed90 in https://github.com/vllm-project/vllm/pull/420
* add QWen-7b support by Sanster in https://github.com/vllm-project/vllm/pull/685
* add internlm model by gqjia in https://github.com/vllm-project/vllm/pull/528
* Check the max prompt length for the OpenAI completions API by nicobasile in https://github.com/vllm-project/vllm/pull/472
* [Fix] unwantted bias in InternLM Model by wangruohui in https://github.com/vllm-project/vllm/pull/740
* Supports tokens and arrays of tokens as inputs to the OpenAI completion API by wanmok in https://github.com/vllm-project/vllm/pull/715
* Fix baichuan doc style by UranusSeven in https://github.com/vllm-project/vllm/pull/748
* Fix typo in tokenizer.py by eltociear in https://github.com/vllm-project/vllm/pull/750
* Align with huggingface Top K sampling by Abraham-Xu in https://github.com/vllm-project/vllm/pull/753
* explicitly del state by cauyxy in https://github.com/vllm-project/vllm/pull/784
* Fix typo in sampling_params.py by wangcx18 in https://github.com/vllm-project/vllm/pull/788
* [Feature | CI] Added a github action to build wheels by Danielkinz in https://github.com/vllm-project/vllm/pull/746
* set default coompute capability according to cuda version by zxdvd in https://github.com/vllm-project/vllm/pull/773
* Fix mqa is false case in gpt_bigcode by zhaoyang-star in https://github.com/vllm-project/vllm/pull/806
* Add support for aquila by shunxing1234 in https://github.com/vllm-project/vllm/pull/663
* Update Supported Model List by zhuohan123 in https://github.com/vllm-project/vllm/pull/825
* Fix 'GPTBigCodeForCausalLM' object has no attribute 'tensor_model_parallel_world_size' by HermitSun in https://github.com/vllm-project/vllm/pull/827
* Add compute capability 8.9 to default targets by WoosukKwon in https://github.com/vllm-project/vllm/pull/829
* Implement approximate GELU kernels by WoosukKwon in https://github.com/vllm-project/vllm/pull/828
* Fix typo of Aquila in README.md by ftgreat in https://github.com/vllm-project/vllm/pull/836
* Fix for breaking changes in xformers 0.0.21 by WoosukKwon in https://github.com/vllm-project/vllm/pull/834
* Clean up code by wenjun93 in https://github.com/vllm-project/vllm/pull/844
* Set replacement=True in torch.multinomial by WoosukKwon in https://github.com/vllm-project/vllm/pull/858
* Bump up the version to v0.1.4 by WoosukKwon in https://github.com/vllm-project/vllm/pull/846

New Contributors
* naed90 made their first contribution in https://github.com/vllm-project/vllm/pull/420
* gqjia made their first contribution in https://github.com/vllm-project/vllm/pull/528
* nicobasile made their first contribution in https://github.com/vllm-project/vllm/pull/472
* wanmok made their first contribution in https://github.com/vllm-project/vllm/pull/715
* UranusSeven made their first contribution in https://github.com/vllm-project/vllm/pull/748
* eltociear made their first contribution in https://github.com/vllm-project/vllm/pull/750
* Abraham-Xu made their first contribution in https://github.com/vllm-project/vllm/pull/753
* cauyxy made their first contribution in https://github.com/vllm-project/vllm/pull/784
* wangcx18 made their first contribution in https://github.com/vllm-project/vllm/pull/788
* Danielkinz made their first contribution in https://github.com/vllm-project/vllm/pull/746
* zhaoyang-star made their first contribution in https://github.com/vllm-project/vllm/pull/806
* shunxing1234 made their first contribution in https://github.com/vllm-project/vllm/pull/663
* ftgreat made their first contribution in https://github.com/vllm-project/vllm/pull/836
* wenjun93 made their first contribution in https://github.com/vllm-project/vllm/pull/844

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.3...v0.1.4

0.1.3

Not secure

What's Changed

Major changes

* More model support: LLaMA 2, Falcon, GPT-J, Baichuan, etc.
* Efficient support for MQA and GQA.
* Changes in the scheduling algorithm: vLLM now uses a TGI-style continuous batching.
* And many bug fixes.

All changes

* fix: only response [DONE] once when streaming response. by gesanqiu in https://github.com/vllm-project/vllm/pull/378
* [Fix] Change /generate response-type to json for non-streaming by nicolasf in https://github.com/vllm-project/vllm/pull/374
* Add trust-remote-code flag to handle remote tokenizers by codethazine in https://github.com/vllm-project/vllm/pull/364
* avoid python list copy in sequence initialization by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/401
* [Fix] Sort LLM outputs by request ID before return by WoosukKwon in https://github.com/vllm-project/vllm/pull/402
* Add trust_remote_code arg to get_config by WoosukKwon in https://github.com/vllm-project/vllm/pull/405
* Don't try to load training_args.bin by lpfhs in https://github.com/vllm-project/vllm/pull/373
* [Model] Add support for GPT-J by AndreSlavescu in https://github.com/vllm-project/vllm/pull/226
* fix: freeze pydantic to v1 by kemingy in https://github.com/vllm-project/vllm/pull/429
* Fix handling of special tokens in decoding. by xcnick in https://github.com/vllm-project/vllm/pull/418
* add vocab padding for LLama(Support WizardLM) by esmeetu in https://github.com/vllm-project/vllm/pull/411
* Fix the `KeyError` when loading bloom-based models by HermitSun in https://github.com/vllm-project/vllm/pull/441
* Optimize MQA Kernel by zhuohan123 in https://github.com/vllm-project/vllm/pull/452
* Offload port selection to OS by zhangir-azerbayev in https://github.com/vllm-project/vllm/pull/467
* [Doc] Add doc for running vLLM on the cloud by Michaelvll in https://github.com/vllm-project/vllm/pull/426
* [Fix] Fix the condition of max_seq_len by zhuohan123 in https://github.com/vllm-project/vllm/pull/477
* Add support for baichuan by codethazine in https://github.com/vllm-project/vllm/pull/365
* fix max seq len by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/489
* Fixed old name reference for max_seq_len by MoeedDar in https://github.com/vllm-project/vllm/pull/498
* hotfix attn alibi wo head mapping by Oliver-ss in https://github.com/vllm-project/vllm/pull/496
* fix(ray_utils): ignore re-init error by mspronesti in https://github.com/vllm-project/vllm/pull/465
* Support `trust_remote_code` in benchmark by wangruohui in https://github.com/vllm-project/vllm/pull/518
* fix: enable trust-remote-code in api server & benchmark. by gesanqiu in https://github.com/vllm-project/vllm/pull/509
* Ray placement group support by Yard1 in https://github.com/vllm-project/vllm/pull/397
* Fix bad assert in initialize_cluster if PG already exists by Yard1 in https://github.com/vllm-project/vllm/pull/526
* Add support for LLaMA-2 by zhuohan123 in https://github.com/vllm-project/vllm/pull/505
* GPTJConfig has no attribute rotary. by leegohi04517 in https://github.com/vllm-project/vllm/pull/532
* [Fix] Fix GPTBigcoder for distributed execution by zhuohan123 in https://github.com/vllm-project/vllm/pull/503
* Fix paged attention testing. by shanshanpt in https://github.com/vllm-project/vllm/pull/495
* fixed tensor parallel is not defined by MoeedDar in https://github.com/vllm-project/vllm/pull/564
* Add Baichuan-7B to README by zhuohan123 in https://github.com/vllm-project/vllm/pull/494
* [Fix] Add chat completion Example and simplify dependencies by zhuohan123 in https://github.com/vllm-project/vllm/pull/576
* [Fix] Add model sequence length into model config by zhuohan123 in https://github.com/vllm-project/vllm/pull/575
* [Fix] fix import error of RayWorker (604) by zxdvd in https://github.com/vllm-project/vllm/pull/605
* fix ModuleNotFoundError by mklf in https://github.com/vllm-project/vllm/pull/599
* [Doc] Change old max_seq_len to max_model_len in docs by SiriusNEO in https://github.com/vllm-project/vllm/pull/622
* fix biachuan-7b tp by Sanster in https://github.com/vllm-project/vllm/pull/598
* [Model] support baichuan-13b based on baichuan-7b by Oliver-ss in https://github.com/vllm-project/vllm/pull/643
* Fix log message in scheduler by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/652
* Add Falcon support (new) by zhuohan123 in https://github.com/vllm-project/vllm/pull/592
* [BUG FIX] upgrade fschat version to 0.2.23 by YHPeter in https://github.com/vllm-project/vllm/pull/650
* Refactor scheduler by WoosukKwon in https://github.com/vllm-project/vllm/pull/658
* [Doc] Add Baichuan 13B to supported models by zhuohan123 in https://github.com/vllm-project/vllm/pull/656
* Bump up version to 0.1.3 by zhuohan123 in https://github.com/vllm-project/vllm/pull/657

New Contributors
* nicolasf made their first contribution in https://github.com/vllm-project/vllm/pull/374
* codethazine made their first contribution in https://github.com/vllm-project/vllm/pull/364
* lpfhs made their first contribution in https://github.com/vllm-project/vllm/pull/373
* AndreSlavescu made their first contribution in https://github.com/vllm-project/vllm/pull/226
* kemingy made their first contribution in https://github.com/vllm-project/vllm/pull/429
* xcnick made their first contribution in https://github.com/vllm-project/vllm/pull/418
* esmeetu made their first contribution in https://github.com/vllm-project/vllm/pull/411
* HermitSun made their first contribution in https://github.com/vllm-project/vllm/pull/441
* zhangir-azerbayev made their first contribution in https://github.com/vllm-project/vllm/pull/467
* MoeedDar made their first contribution in https://github.com/vllm-project/vllm/pull/498
* Oliver-ss made their first contribution in https://github.com/vllm-project/vllm/pull/496
* mspronesti made their first contribution in https://github.com/vllm-project/vllm/pull/465
* wangruohui made their first contribution in https://github.com/vllm-project/vllm/pull/518
* Yard1 made their first contribution in https://github.com/vllm-project/vllm/pull/397
* leegohi04517 made their first contribution in https://github.com/vllm-project/vllm/pull/532
* shanshanpt made their first contribution in https://github.com/vllm-project/vllm/pull/495
* zxdvd made their first contribution in https://github.com/vllm-project/vllm/pull/605
* mklf made their first contribution in https://github.com/vllm-project/vllm/pull/599
* SiriusNEO made their first contribution in https://github.com/vllm-project/vllm/pull/622
* Sanster made their first contribution in https://github.com/vllm-project/vllm/pull/598
* YHPeter made their first contribution in https://github.com/vllm-project/vllm/pull/650

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.2...v0.1.3

0.1.2

Not secure

What's Changed

- Initial support for GPTBigCode
- Support for MPT and BLOOM
- Custom tokenizer
- ChatCompletion endpoint in OpenAI demo server
- Code format
- Various bug fixes and improvements
- Documentation improvement

Contributors

Thanks to the following amazing people who contributed to this release:

michaelfeil WoosukKwon metacryptom merrymercy BasicCoder zhuohan123 twaka comaniac neubig JRC1995 LiuXiaoxuanPKU bm777 Michaelvll gesanqiu ironpinguin coolcloudcol akxxsb

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.1...v0.1.2

0.1.1

Not secure

What's Changed
* Fix Ray node resources error by zhuohan123 in https://github.com/vllm-project/vllm/pull/193
* [Bugfix] Fix a bug in RequestOutput.finished by WoosukKwon in https://github.com/vllm-project/vllm/pull/202
* [Fix] Better error message when there is OOM during cache initialization by zhuohan123 in https://github.com/vllm-project/vllm/pull/203
* Bump up version to 0.1.1 by zhuohan123 in https://github.com/vllm-project/vllm/pull/204

**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.0...v0.1.1

Page 7 of 8

Releases

Has known vulnerabilities

Previous Next

Vllm

Page 7 of 8

0.1.6

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

Page 7 of 8

Links

Releases