Features
- Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
- Introduced `fx2mlir`, a new functionality for enhanced MLIR conversion.
- Implemented `nnvlc2.0` and `nnvlc1.0` local activation and weight operations, respectively, for improved neural network performance.
- Enabled `TPULANG` support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
- Added `cv186x` support in `run_sensitive_layer.py` and for the TDB, expanding compatibility and debugging capabilities.
- Introduced new ops and features like `Watchpoint` in TDB and `activation ops` support for scale & zero_point, broadening the range of functionalities available in the `tpu-mlir` project.
- Supports `BM1690`.
- L2mem performs intermediate data exchange for active tensor.
Bug Fixes
- Resolved a variety of bugs affecting backend processes, including issues with the `1684x` backend, `permutefuse2`, `permutemulconstswap`, and more, improving overall stability and performance.
- Fixed several critical issues across `tpulang`, including errors in `sort_by_key` operation, `reshape` operations, `where` operation, and more, enhancing the language's reliability for developers.
- Addressed bugs in model processing, including fixes for `concat` logic, `scale2conv`, `scale2conv3d`, `instance norm`, and several more, ensuring smoother model optimization and execution.
- Corrected errors in the documentation, providing clearer and more accurate information for users and developers.
Documentation Updates
- Updated `tpulang` documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.
Performance Improvements
- Optimized TDB and `bmodel_checker` for `1684x pcie` mode, significantly reducing processing times and enhancing efficiency for model analysis.
- Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
- Enabled IO tag mode and refined address mode for better memory management and operational flexibility.