Release Notes
We are excited to announce the release of `onnxruntime-genai` version 0.6.0. Below are the key updates included in this release:
1. Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
2. Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
3. Python 3.13 wheels have been introduced
4. Support for generation for models sourced from [Qualcomm's AI Hub](https://aihub.qualcomm.com/mobile/models). This work also includes publishing a nuget package `Microsoft.ML.OnnxRuntimeGenAI.QNN` for QNN EP
5. Support for WebGPU EP
This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.