Changes
- Add support for LLaMA, MPT models
- Change `repetition_penalty` default value from `1.0` to `1.1`
Breaking Changes
- Update GGML library which has breaking changes to quantization formats. Old models have to be re-quantized.
Some of the latest quantized models are available [here](https://huggingface.co/TheBloke).