Openllm

Latest version: v0.6.23

Safety actively analyzes 723607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 17 of 24

0.2.20

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.20 openllm --help

Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)



**Full Changelog**: https://github.com/bentoml/OpenLLM/compare/v0.2.18...v0.2.20

0.2.18

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.18 openllm --help

Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)



What's Changed
* feat(strategy): only spawn up one runner by aarnphm in https://github.com/bentoml/OpenLLM/pull/189
* feat: homebrew tap by aarnphm in https://github.com/bentoml/OpenLLM/pull/190
* refactor(cli): compiled wheels and extension modules by aarnphm in https://github.com/bentoml/OpenLLM/pull/191


**Full Changelog**: https://github.com/bentoml/OpenLLM/compare/v0.2.17...v0.2.18

0.2.17

Usage

All available models: openllm models

To start a LLM: python -m openllm start opt

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it --entrypoint=/bin/bash -P ghcr.io/bentoml/openllm:0.2.17 openllm --help

Find more information about this release in the [CHANGELOG.md](https://github.com/bentoml/OpenLLM/blob/main/CHANGELOG.md)



What's Changed
* feat: optimize model saving and loading on single GPU by aarnphm in https://github.com/bentoml/OpenLLM/pull/183
* fix(ci): update version correctly [skip ci] by aarnphm in https://github.com/bentoml/OpenLLM/pull/184
* fix(models): setup xformers in base container and loading PyTorch meta weights by aarnphm in https://github.com/bentoml/OpenLLM/pull/185
* infra(generation): initial work for generating tokens by aarnphm in https://github.com/bentoml/OpenLLM/pull/186
* ci: pre-commit autoupdate [pre-commit.ci] by pre-commit-ci in https://github.com/bentoml/OpenLLM/pull/187
* feat: --force-push to allow force push to bentocloud by aarnphm in https://github.com/bentoml/OpenLLM/pull/188


**Full Changelog**: https://github.com/bentoml/OpenLLM/compare/v0.2.16...v0.2.17

0.2.16

Fixes a regression introduced between 0.2.13 to 0.2.15 wrt to vLLM not able to run correctly within the docker container

**Full Changelog**: https://github.com/bentoml/OpenLLM/compare/v0.2.13...v0.2.16

0.2.13

What changes?

Fixes auto-gptq kernel CUDA within base container.
Add support for all vLLM models. Update the vllm to latest stable commit.


**Full Changelog**: https://github.com/bentoml/OpenLLM/compare/v0.2.12...v0.2.13

0.2.12

News

OpenLLM now release a base container containing all compiled kernels, removing the needs for building kernels with `openllm build` when using vLLM or auto-gptq

vLLM supports (experimental)

Currently, only OPT and Llama 2 supports vLLM. Simply use `OPENLLM_LLAMA_FRAMEWORK=vllm` to startup openllm runners with vllm.

Installation

bash

Page 17 of 24

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.