Tpu-mlir

Latest version: v1.16

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 7

1.7

Change Log

New Features
- Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
- Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
- Added `--dump_dataframe` option for bmodel_checker and support for transpose with order `[1, 2, 3, 0]`.
- Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
- Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
- Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
- Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
- Added new patterns for Cswin and Einsum operations.
- Improved support for LLM (Large Language Models) in bm1688.

Bug Fixes
- Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
- Addressed logical issues in AddToScale pattern and issues in fp_forward.
- Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
- Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Performance Improvements
- Improved the performance of TDB and the bmodel_checker for 1684x pcie.
- Optimized facenet and fixed performance issues of 1688 multicore.
- Enabled single-core mode optimizations where necessary.

Documentation and Testing
- Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
- Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
- Fixed various documentation errors and updated the release note.

Other Changes
- Added restrictions to tpulang ops and net test cases.
- Adjusted descriptions and refined interfaces for better user experience.
- Updated backend .so files and addressed sensitive words in the codebase.
- Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.

1.7beta.0

Features
- Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
- Introduced `fx2mlir`, a new functionality for enhanced MLIR conversion.
- Implemented `nnvlc2.0` and `nnvlc1.0` local activation and weight operations, respectively, for improved neural network performance.
- Enabled `TPULANG` support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
- Added `cv186x` support in `run_sensitive_layer.py` and for the TDB, expanding compatibility and debugging capabilities.
- Introduced new ops and features like `Watchpoint` in TDB and `activation ops` support for scale & zero_point, broadening the range of functionalities available in the `tpu-mlir` project.
- Supports `BM1690`.
- L2mem performs intermediate data exchange for active tensor.

Bug Fixes
- Resolved a variety of bugs affecting backend processes, including issues with the `1684x` backend, `permutefuse2`, `permutemulconstswap`, and more, improving overall stability and performance.
- Fixed several critical issues across `tpulang`, including errors in `sort_by_key` operation, `reshape` operations, `where` operation, and more, enhancing the language's reliability for developers.
- Addressed bugs in model processing, including fixes for `concat` logic, `scale2conv`, `scale2conv3d`, `instance norm`, and several more, ensuring smoother model optimization and execution.
- Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Documentation Updates
- Updated `tpulang` documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Performance Improvements
- Optimized TDB and `bmodel_checker` for `1684x pcie` mode, significantly reducing processing times and enhancing efficiency for model analysis.
- Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
- Enabled IO tag mode and refined address mode for better memory management and operational flexibility.

1.6.1

**Full Changelog**: https://github.com/sophgo/tpu-mlir/compare/v1.6...v1.6.1

1.6

Change Log

Bug Fixes
- Fixed documentation errors and added checks for documentation errors during build.
- Set workaround for `ar.copy` cycle issue to 0, avoiding potential data overwriting in inplacing operations.
- Addressed a bug in `Caffe DetectionOutput` and fixed a hang in `cv186x`.
- Corrected `Mul buffer` size alignment issues and various other buffer size corrections.
- Fixed issues with `attention accuracy`, `RotaryPosEmbPattern`, and `op status validation` before the matching process.
- Addressed a series of backend bugs, including daily build errors, performance declines, and incorrect return values.
- Fixed `data_checker` issues, `api_conv` bug, and a local slice calculation bug.
- Resolved incorrect affineMap for Pooling buffer and fixed reshape bug for inner products.
- Corrected `Mul&Div` dynamic support for local operations and fixed issues with `Conv2d` buffer size calculations.
- Addressed various matmul bugs, including fp8 support issues and quantization inconsistencies.

Features
- Enabled multicore optimizations and added support for multi-core model tests.
- Updated `libbackend_1688.so` and various backend updates for better performance and compatibility.
- Introduced `groupParallel` operation, support for dynamic input data generation.
- Added support for new patterns such as `Permute fuse pattern` and `splitQuantizedMLP pattern`.
- Implemented `npz compare visualizer` tool and added support for `bm1688 backend`.
- Added `MatMul weight split case` and improved permute performance.
- Added support for `img2col pattern`, attention interface, and several dialects for SG2260 operations.

Documentation Updates
- Updated release notes and resolved issues with document formatting.
- Standardized expression terminology and replaced sensitive words in documentation.

Performance Improvements
- Improved local softmax performance and optimized dataFlow checking in coreMatch.
- Enhanced performance for Vit L i8 4 batch operations and refined conv multi-core handling.
- Optimized VIT-B concurrency and addressed performance issues with `MaxPool` buffer sizes.

1.6beta.0

New Features

- Implemented SG2260 structureOp interface and structured transform, including a solver for finding transforms【ea234bc2†source】.
- Added OneHot converter and support for fp8 in the debugger【c03ba46c†source】【f87127bd†source】【fed7e68a†source】.
- Supported MatMulOp for special cases broadcast in batch dims and added interface for attention【90d4b327†source】【044c4fc3†source】.
- Provided "decompose linalg op" and "tile+fuse" pass for MatMul parallel supports more batch patterns【25f24e3d†source】.
- Unet single block test added【ea76f9c9†source】.
- Implemented fp8 support for Matmul and other ops including addconst, subconst, mul, add, sub, and abs【e09adbda†source】【7eaec57f†source】.

Performance Improvements

- Improved Matmul fp8 performance with new backend support【2b8dd03b†source】.
- Enabled distribute MLP and attention with improved performance for cascade_net input/output names and order【d5a42d7a†source】.
- Refactored tdb to improve disassembler serialize and resolve BM1688 decoding issue【e73450f8†source】【1457df29†source】.
- Improved weight reorder for ConvOp and optimized permute of attention matmul【a9045c3c†source】【91a353e3†source】.

Bug Fixes

- Resolved various bugs in MatMul, Conv, and other ops across multiple chipsets including SG2260, BM1688, and CV18xx【b809a8c1†source】【bfada4de†source】【9804e30c†source】.
- Fixed bugs related to ReduceOp, ArgOp, SliceOp, and others for better operation and tensor handling【2cdeb60d†source】【bbacf00f†source】.
- Addressed issues in SAM, daily test, and tdb related to core operations and functionality【83e1979c†source】【7c37e39d†source】.
- Fixed memory and data handling bugs for more accurate and stable operation of the models【2310cd8d†source】【0ed60f1f†source】.

Documentation Updates

- Updated documentation to remove sensitive words and improve clarity and comprehensiveness【43e0b428†source】【5d6c49fc†source】.

Miscellaneous

- Enhanced various backend libraries and supported new ops and patterns for more efficient and versatile model handling【1ca95d71†source】【8f1a2de8†source】.
- Improved scatterE and reduce dynamic shape_value handling for better model optimization【fa2ccf29†source】.
- Refinements in graph optimization, permute parallel indexMapping, and related areas for improved model processing【094f05da†source】【1ec6c16b†source】.

1.5beta.0

TPU-MLIR Project Update

Bug Fixes and Dependency Updates
- **Fix Dependency**: Fixed the dependency of MLIRInputConversion.
- **SDK Release Workflow**: Fixed tpu-mlir tag for building and added workflow file for SDK release.
- **Softplus LoweringINT8**: Fixed 1684 Softplus LoweringINT8 issue.
- **Slice Begin Index**: Fixed bm1684 slice begin_index problem.
- **Mul Conflict Resolution**: Partially fixed the output data sign of mul conflict with chip restriction.

Feature Enhancements and Support
- **Subgraph Split Support**: Enhanced support for subgraph split.
- **Quant IO List Note**: Added quant io list note for better quantization handling.
- **New Full Operation**: Supported the aten::new_full operation.
- **Torch Flip for bm1684x**: Added support for torch.flip for bm1684x.
- **Weight Input Shape Bind**: Supported shape bind for weight input.

Updates and Implementations for Specific Operations
- **Backend Update for sg2260**: Updated sg2260 for backend for tag31.
- **ScatterElements Implementation**: Implemented ScatterElements for any axis.
- **Unary Indexing Map**: Added unary indexing map.
- **Binary Indexing Map**: Added binary (add/sub/mul/div/min/max) indexing map.
- **Dynamic NMS Support**: Featured support for dynamic nms for bm1684x.

Codebase and Documentation Refinements
- **Cleanup**: Removed test/sg2260 dialect.
- **Documentation Update**: Updated nntoolchain README and lib.
- **Codegen Documentation**: Added documentation for codegen.
- **Template Format Update**: Updated import mlir file template format.
- **Quick Start Docs Modification**: Modified quick start docs for tpu-mlir.

Optimizations and Performance Improvements
- **Kernel Module Usage**: Reverted to using the old kernel module.
- **MLIR Conv2D Optimization**: Improved 1684 mlir conv2d with 3ic optimization.
- **SWINT Quantization**: Added swint quant for better performance.
- **Opt Parameter Addition**: Added an optimization parameter.
- **Loop and Fusion Enhancements**: Supported interchange of inner loop, padOp transform, tensor op collapse, fusion on linalg-on-tensor, etc.

Page 4 of 7

Releases

Has known vulnerabilities

Previous Next

Tpu-mlir

Page 4 of 7

1.7

1.7beta.0

1.6.1

1.6

1.6beta.0

1.5beta.0

Page 4 of 7

Links

Releases