Changed
- Model names no longer include backend and quantization info
- Default to CPU inference unless GPU enabled using `lm.config["device"]="auto"`
Added
- Add quantization info to config and use it for memory usage calculation
Fixed
- Increase repetition penalty to 1.3 from 1.2 to help avoid repetition in smaller models