Changed
- Pin Llama 3.2 model versions
- Decrease repetition penalty for Llama 3.2 models
Added
- Support SmolLM2
- Add `embed` function
- Support Llama 3.1 8B instruct
- Use models directly from Huggingface with config.use_hf_model()
- Add "echo" config option to allow streaming tokens to stdout as they are generated