1. Fix crash on truncating long kv cache;
2. Fix crash on chatting with the image which has Alpha channel;
3. Fix VRAM occupations when zero offloading with `--mmproj`;
4. Compatible with some GGUF files which described the wrong `kv_count`, e.g: [CompendiumLabs/bge-large-zh-v1.5-gguf/FP16](https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/tree/main?show_file_info=bge-large-zh-v1.5-f16.gguf).