Major release with a range of changes to Python interface, search implementation, transpiler, and documentation.
What's Changed
* [Triton CodeGen] Fix an issue when generating Triton programs from mugraphs
* [LoRA demo] Add the checkpoint file for the lora demo
* [DeviceMemoryManager] Use offsets instead of pointers to locate tensors and fingerprints in device memory
* [Graph Generator] Parallelize the generation algorithm
* Improve parallel search performance
* [Accumulator] Decouples accumulator from output saver in threadblock graphs
* Update the setup workflow for packaging
* Add more element_unary & element_binary operators at the kernel and threadblock levels
* [CUDA Transpiler] Supporting JIT transpilation and compilation
* [Search] Range-based pruning
* Fix some existing issues by xinhaoc in https://github.com/mirage-project/mirage/pull/63
* [Transpiler] Support threadblock matmul using cute when the input/output stensors have more than 2 dimensions
* Include header files for JIT compilation. MIRAGE_ROOT is no longer required.
* [Python] update python interface to support search
* [Search] Adjust the expansion phase of search
* [Search] Improve the display of search statistics
* Set default max_num_threadblock_graphs to 1
New Contributors
* wmdi made their first contribution in https://github.com/mirage-project/mirage/pull/3
* geohotstan made their first contribution in https://github.com/mirage-project/mirage/pull/14
* jiakunw made their first contribution in https://github.com/mirage-project/mirage/pull/20
* interestingLSY made their first contribution in https://github.com/mirage-project/mirage/pull/36
**Full Changelog**: https://github.com/mirage-project/mirage/commits/v0.2.0