What's Changed
Major changes
* More model support: LLaMA 2, Falcon, GPT-J, Baichuan, etc.
* Efficient support for MQA and GQA.
* Changes in the scheduling algorithm: vLLM now uses a TGI-style continuous batching.
* And many bug fixes.
All changes
* fix: only response [DONE] once when streaming response. by gesanqiu in https://github.com/vllm-project/vllm/pull/378
* [Fix] Change /generate response-type to json for non-streaming by nicolasf in https://github.com/vllm-project/vllm/pull/374
* Add trust-remote-code flag to handle remote tokenizers by codethazine in https://github.com/vllm-project/vllm/pull/364
* avoid python list copy in sequence initialization by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/401
* [Fix] Sort LLM outputs by request ID before return by WoosukKwon in https://github.com/vllm-project/vllm/pull/402
* Add trust_remote_code arg to get_config by WoosukKwon in https://github.com/vllm-project/vllm/pull/405
* Don't try to load training_args.bin by lpfhs in https://github.com/vllm-project/vllm/pull/373
* [Model] Add support for GPT-J by AndreSlavescu in https://github.com/vllm-project/vllm/pull/226
* fix: freeze pydantic to v1 by kemingy in https://github.com/vllm-project/vllm/pull/429
* Fix handling of special tokens in decoding. by xcnick in https://github.com/vllm-project/vllm/pull/418
* add vocab padding for LLama(Support WizardLM) by esmeetu in https://github.com/vllm-project/vllm/pull/411
* Fix the `KeyError` when loading bloom-based models by HermitSun in https://github.com/vllm-project/vllm/pull/441
* Optimize MQA Kernel by zhuohan123 in https://github.com/vllm-project/vllm/pull/452
* Offload port selection to OS by zhangir-azerbayev in https://github.com/vllm-project/vllm/pull/467
* [Doc] Add doc for running vLLM on the cloud by Michaelvll in https://github.com/vllm-project/vllm/pull/426
* [Fix] Fix the condition of max_seq_len by zhuohan123 in https://github.com/vllm-project/vllm/pull/477
* Add support for baichuan by codethazine in https://github.com/vllm-project/vllm/pull/365
* fix max seq len by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/489
* Fixed old name reference for max_seq_len by MoeedDar in https://github.com/vllm-project/vllm/pull/498
* hotfix attn alibi wo head mapping by Oliver-ss in https://github.com/vllm-project/vllm/pull/496
* fix(ray_utils): ignore re-init error by mspronesti in https://github.com/vllm-project/vllm/pull/465
* Support `trust_remote_code` in benchmark by wangruohui in https://github.com/vllm-project/vllm/pull/518
* fix: enable trust-remote-code in api server & benchmark. by gesanqiu in https://github.com/vllm-project/vllm/pull/509
* Ray placement group support by Yard1 in https://github.com/vllm-project/vllm/pull/397
* Fix bad assert in initialize_cluster if PG already exists by Yard1 in https://github.com/vllm-project/vllm/pull/526
* Add support for LLaMA-2 by zhuohan123 in https://github.com/vllm-project/vllm/pull/505
* GPTJConfig has no attribute rotary. by leegohi04517 in https://github.com/vllm-project/vllm/pull/532
* [Fix] Fix GPTBigcoder for distributed execution by zhuohan123 in https://github.com/vllm-project/vllm/pull/503
* Fix paged attention testing. by shanshanpt in https://github.com/vllm-project/vllm/pull/495
* fixed tensor parallel is not defined by MoeedDar in https://github.com/vllm-project/vllm/pull/564
* Add Baichuan-7B to README by zhuohan123 in https://github.com/vllm-project/vllm/pull/494
* [Fix] Add chat completion Example and simplify dependencies by zhuohan123 in https://github.com/vllm-project/vllm/pull/576
* [Fix] Add model sequence length into model config by zhuohan123 in https://github.com/vllm-project/vllm/pull/575
* [Fix] fix import error of RayWorker (604) by zxdvd in https://github.com/vllm-project/vllm/pull/605
* fix ModuleNotFoundError by mklf in https://github.com/vllm-project/vllm/pull/599
* [Doc] Change old max_seq_len to max_model_len in docs by SiriusNEO in https://github.com/vllm-project/vllm/pull/622
* fix biachuan-7b tp by Sanster in https://github.com/vllm-project/vllm/pull/598
* [Model] support baichuan-13b based on baichuan-7b by Oliver-ss in https://github.com/vllm-project/vllm/pull/643
* Fix log message in scheduler by LiuXiaoxuanPKU in https://github.com/vllm-project/vllm/pull/652
* Add Falcon support (new) by zhuohan123 in https://github.com/vllm-project/vllm/pull/592
* [BUG FIX] upgrade fschat version to 0.2.23 by YHPeter in https://github.com/vllm-project/vllm/pull/650
* Refactor scheduler by WoosukKwon in https://github.com/vllm-project/vllm/pull/658
* [Doc] Add Baichuan 13B to supported models by zhuohan123 in https://github.com/vllm-project/vllm/pull/656
* Bump up version to 0.1.3 by zhuohan123 in https://github.com/vllm-project/vllm/pull/657
New Contributors
* nicolasf made their first contribution in https://github.com/vllm-project/vllm/pull/374
* codethazine made their first contribution in https://github.com/vllm-project/vllm/pull/364
* lpfhs made their first contribution in https://github.com/vllm-project/vllm/pull/373
* AndreSlavescu made their first contribution in https://github.com/vllm-project/vllm/pull/226
* kemingy made their first contribution in https://github.com/vllm-project/vllm/pull/429
* xcnick made their first contribution in https://github.com/vllm-project/vllm/pull/418
* esmeetu made their first contribution in https://github.com/vllm-project/vllm/pull/411
* HermitSun made their first contribution in https://github.com/vllm-project/vllm/pull/441
* zhangir-azerbayev made their first contribution in https://github.com/vllm-project/vllm/pull/467
* MoeedDar made their first contribution in https://github.com/vllm-project/vllm/pull/498
* Oliver-ss made their first contribution in https://github.com/vllm-project/vllm/pull/496
* mspronesti made their first contribution in https://github.com/vllm-project/vllm/pull/465
* wangruohui made their first contribution in https://github.com/vllm-project/vllm/pull/518
* Yard1 made their first contribution in https://github.com/vllm-project/vllm/pull/397
* leegohi04517 made their first contribution in https://github.com/vllm-project/vllm/pull/532
* shanshanpt made their first contribution in https://github.com/vllm-project/vllm/pull/495
* zxdvd made their first contribution in https://github.com/vllm-project/vllm/pull/605
* mklf made their first contribution in https://github.com/vllm-project/vllm/pull/599
* SiriusNEO made their first contribution in https://github.com/vllm-project/vllm/pull/622
* Sanster made their first contribution in https://github.com/vllm-project/vllm/pull/598
* YHPeter made their first contribution in https://github.com/vllm-project/vllm/pull/650
**Full Changelog**: https://github.com/vllm-project/vllm/compare/v0.1.2...v0.1.3