To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.20 openllm --help
Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.18 openllm --help
Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)
What's Changed * feat(strategy): only spawn up one runner by aarnphm in https://github.com/bentoml/OpenLLM/pull/189 * feat: homebrew tap by aarnphm in https://github.com/bentoml/OpenLLM/pull/190 * refactor(cli): compiled wheels and extension modules by aarnphm in https://github.com/bentoml/OpenLLM/pull/191
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.17 openllm --help
Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)
What's Changed * feat: optimize model saving and loading on single GPU by aarnphm in https://github.com/bentoml/OpenLLM/pull/183 * fix(ci): update version correctly [skip ci] by aarnphm in https://github.com/bentoml/OpenLLM/pull/184 * fix(models): setup xformers in base container and loading PyTorch meta weights by aarnphm in https://github.com/bentoml/OpenLLM/pull/185 * infra(generation): initial work for generating tokens by aarnphm in https://github.com/bentoml/OpenLLM/pull/186 * ci: pre-commit autoupdate [pre-commit.ci] by pre-commit-ci in https://github.com/bentoml/OpenLLM/pull/187 * feat: --force-push to allow force push to bentocloud by aarnphm in https://github.com/bentoml/OpenLLM/pull/188
OpenLLM now release a base container containing all compiled kernels, removing the needs for building kernels with `openllm build` when using vLLM or auto-gptq
vLLM supports (experimental)
Currently, only OPT and Llama 2 supports vLLM. Simply use `OPENLLM_LLAMA_FRAMEWORK=vllm` to startup openllm runners with vllm.