Adds the ability to extend the context length of models via the `RoPE_scaling` parameter.
0.2.14
Simplified the interaction with other GGML based repos. Like [TheBloke/Llama-2-7B-GGML](https://huggingface.co/TheBloke/Llama-2-7B-GGML) created by [TheBloke](https://huggingface.co/TheBloke).
0.2.13
Fixed many gpu acceleration bugs in `rustformers\llm` and improved performance to match native `ggml`.
0.2.12
Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.
Adds CI for the different acceleration backends to create prebuild binaries
0.2.11
- Added support for the haystack library - Support "BigCode" like models (e.g. WizardCoder) via the `gpt2` architecture