- Implement flat_tensors: A complete solution to convert arbitrary python objects to a list of PyTorch tensors, making JIT trace more flexible. - Add more fuse passes to improve performance on ComfyUI.
0.0.6
Fix acquiring unreachable GIL when process exits
0.0.5
Disable CUDA Graph for SDXL
0.0.4
Many bug fixes and improvements:
- Support SDXL - Support CUDA Graph with dynamic shape - Support development version of Triton - Fix crash when process exits because of missing GIL
0.0.3
Bug fixes:
- Fix compilation failure when Triton is not enabled. - Fix wrong output in Triton NCHW GroupNorm kernel.