Summary
This is a major release that introduces an OpenAI-compatible server in a completely new `serve` tool, support for Quark quantization in the new `quark` tool, and many other fixes/improvements.
Breaking Changes
New OpenAI-Compatible Server
The previous `serve` `Tool` has been replaced by a new standalone serving command. This new server has OpenAI API compatibility and will add Ollama compatibility in the near future.
- Old usage: `lemoande -i CHECKPOINT oga-load --args serve`
- New usage: `lemonade serve`, then use REST APIs to control model loading, completions, etc. See https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md to learn more.
The server can also be installed and used with no-code by running `Lemonade_Server_Installer.exe`, which is provided as a release asset in this and all future releases.
The server code was also moved out of tools/chat.py into its own file in tools/serve.py. We also renamed chat.py to prompt.py for clarity, since that file now only contains the prompting tool.
The LEAP name has been deprecated
In the interest of reducing naming confusion, the "LEAP API" is now simply the "high-level lemonade API".
- Old usage: `from lemonade.leap import from_pretrained`
- New usage: `from lemonade.api import from_pretrained`
Summary of Contributions
- The base checkpoint for models is retrieved from the Hugging Face API at loading time (ramkrishna2910)
- The benchmarking tools (huggingface-bench, oga-bench, and llamacpp-bench) have been refactored to reduce code duplication and improve maintainability. They now also support a list of prompts (or prompt lengths) to be benchmarked: `--prompts 128 256 512` (amd-pworfolk)
- The `avg_accuracy` stats has been renamed to `average_mmlu_accuracy` for clarity with respect to non-MMLU accuracy tests (jeremyfowers), (attn apsonawane)
- Introduce `Lemonade_Server_Installer.exe` (jeremyfowers)
- Implement an OpenAI-compatible server and remove the old `serve` tool (danielholanda)
- Rename `chat` module to `prompt` (jeremyfowers)
- Improved lemonade getting started documentation and remove the "LEAP" branding (jeremyfowers)
- OGA 0.6.0 is the default package for CPU, CUDA, and DML (jeremyfowers)
- Add support for Quark quantization with a new `quark-quantize` tool (iswaryaalex)
- Clean up the lemonade getting started docs and remove some deprecated tools (jeremyfowers)
New Contributors
- iswaryaalex made their first contribution in 290
**Full Changelog**: https://github.com/onnx/turnkeyml/compare/v5.1.1...v6.0.0