Changes
- Skip evaluating tokens that are evaluated in the past. This can significantly speed up prompt processing in chat applications that prepend previous messages to prompt.
- Deprecate `LLM.reset()` method. Use high-level API instead.
- Add support for batching and beam search to 🤗 model.
- Remove universal binary option when building for AVX2, AVX on macOS.