Release Notes - February 28, 2025
Overview
This release introduces a fully functional MLX-Audio package with text-to-speech capabilities, complete with testing infrastructure and CI/CD integration via GitHub Actions.
New Features
- **Text-to-Speech Generation**: Added complete generation pipeline with audio output functionality
- **Audio Joining**: New functionality to join multiple audio segments
- **Model Quantization**: Added support for model quantization to improve performance
- **GitHub Actions**: Implemented CI/CD workflows for automated testing and deployment
Improvements
- **Kokoro MLX porting**: Completed refactoring of the entire model to MLX framework:
- Text encoder with BERT implementation
- Decoder with improved audio quality
- Duration, indices, and alignment target prediction
- Custom Bidirectional LSTM, Weight norm for CNNs, AdaLayerNorm and Generator layers
- **SafeTensors Support**: Added working implementation for SafeTensors format
- **Pipeline Structure**: Restructured the generation pipeline for better maintainability
Bug Fixes
- Fixed model loading mechanism
- Resolved issues with text encoder LayerNorm operation
- Fixed generator functionality
- Addressed issues in LSTM and AdaLayerNorm implementations
- Refactored and fixed ConvWeight component
-
**Full Changelog**: https://github.com/Blaizzy/mlx-audio/commits/v0.0.1