Llama-cpp-cffi

Latest version: v0.1.21

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.1.9

Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.

Changed:
- Build scripts now have separate step/function `cuda_12_5_1_setup` which setups CUDA 12.5.1 env for build-time.

Fixed:
- Stop thread in `llama_generate` on `GeneratorExit`.

Removed:
- `callback` parameter in `llama_generate` and dependent functions.

0.1.8

Added:
- `Model.tokenizer_hf_repo` as optional in case when `Model.creator_hf_repo` cannot be used to tokenize / format prompt/messages.

0.1.7

Added:
- Support for `stop` tokens/words.

Changed:
- `llama/llama_cli.py` unified CPU and CUDA 12.5 modules into single module.

Removed:
- Removed separate examples for CPU and CUDA 12.5 modules.

0.1.6

Changed:
- Updated `huggingface-hub`.

Fixed:
- `llama.__init__` now correctly imports submodules and handles CPU and CUDA backends.
- OpenAI: `ctx_size: int = config.max_position_embeddings if max_tokens is None else max_tokens`.

0.1.5

Fixed:
- Build for linux, upx uses best compression option, 7z uses more aggressive compression.
- Do not use UPX for shared/dynamic library compression.

0.1.4

Added:
- README: supported GPU Compute Capability for CUDA.

Fixed:
- Cleaned up `build.py`.
- Type annotations in OpenAI related code.

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.