What's Changed
- [Fix] Fixing an error triggered by the operator `any` (369) by **Bolin Sun** 6a4c2e54
- [Fix] added torch.t for mobilebert-uncased model (353) by **zhumakhan** 95d95a4c
- [CI] Use same image for tests and publishing test execution (463) by **c-fteixeira** 49fd3325
- [BUG] fix bug in disallow in graph (464) by **Vadim Gimpelson** d84f2c5b
- [CI] Move Publish workflow to internal ARC runners (461) by **c-fteixeira** b5d6aafd
- [CI] Update container for CI (460) by **Vadim Gimpelson** b9735910
- [Bug] Rename test_arithmetic.py -> test_arithmetic2.py (459) by **Vadim Gimpelson** 6aa6cf82
- Update requirements-dev.txt to use pytorch version >= 2.3.0 (458) by **Vadim Gimpelson** 6b322953
- [CI] Repeat start_instance (361) by **vadiklyutiy** cf5caddf
- [Operators] Adding `leaky_relu` support (360) by **Bolin Sun** 7401cccb
- [Fix] Fixing an error triggered while compiling the `torch.nn.Upsample` module with `align_corners=True` (344) by **Bolin Sun** 2c34cfc0
- [PERF] Remote workaround for loops in `add_hints_pass` (356) by **vadiklyutiy** 3195be5b
- [Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (347) by **Bolin Sun** 44ab5ad3
- [PERF] Introduce add_hint_pass (355) by **vadiklyutiy** c014dab1
- [CI] Promote nvidia docker container to version 24.4 (354) by **vadiklyutiy** cb809b99
- [Fix] type casting for attention mask from fp32 -> f16 (323) by **zhumakhan** 9a10dc01
- [Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (351) by **zhumakhan** 18842eeb
- [Fix] Fixing a bug in `register_methods` (331) by **Bolin Sun** c87c5153
- [Fix] Handling special cases in `setitem` regarding dtype and device (332) by **Bolin Sun** ff9445e2
- [BUG] Fixed search_space bug in `bench_op.py` (348) by **vadiklyutiy** 29e4c0e8
- [OPS] Dissallow in fxgraph not supported functions (317) by **vadiklyutiy** 984cf75e
- [OPTIONS] Remove dynamo_config['search_space'] (342) by **vadiklyutiy** 0814bd8e
- [Operator] Adding support for `torch.Tensor.view_as` (334) by **Bolin Sun** 5f19dd05
- [Operators] Adding support for `torch.nn.TransformerEncoder` (327) by **Bolin Sun** d625146e
- [OPTIONS] Inherit `options` from `torch.compile()` (260) by **vadiklyutiy** 3638a0b5
- [Operator] Adding `__ge__` method for the `Tensor` class (330) by **Bolin Sun** ed5fefff
- [Fix] Fixing an error triggered by `ClampOp` (329) by **Bolin Sun** 05984cb8
- [Fix] Handling hidet errors caused by device difference in `getitem` (322) by **Bolin Sun** 5a908205
- [Fix] Fixing a RuntimeError triggered by `tensor_reshape` function in `register_functions.py` (328) by **Bolin Sun** 0cd2f838
- [Operators] Adding PyTorch operators encountered while compiling `DALLE2_pytorch` (319) by **Bolin Sun** ecb99b1d
- [Fix] Fix the bug in `tensor_expand` caused by attempting to modify `immutable_list` (320) by **Bolin Sun** bb89e227
- [Chore] replace copyrights with citations (315) by **xiaocenxiaocen** 3fba0919
- [Operator] Extending the functionality support for `einsum` (312) by **Bolin Sun** 703e92aa
- Handle dtype and device in hidet.ones_like op (316) by **zhumakhan** f031eb30
- [PERF] Reduce fixed overhead for model run (310) by **vadiklyutiy** fadf67d3
- Increase batch size for bert to decrease fluctuations (236) by **vadiklyutiy** a8db40cf
- Setitem with tensor values. And Boolean type promotion (290) by **zhumakhan** 60e75ca4
- [BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (311) by **zhumakhan** d0474402
- [Graph][Ops] fp32 accumulation for cute matmul (292) by **xiaocenxiaocen** a8136059
- [Perf] support vectorized epilogue fusion (220) by **xiaocenxiaocen** ddacf36b
- Removing constant tensors that are not needed after subgraph rewrite pass (252) by **zhumakhan** db49f688
- [Fix] Handling `Tensor.to(..., device=....)` on symbolic tensors (284) by **Bolin Sun** 63578804
- [Operator] torch.any (287) by **zhumakhan** 8a42a65f
- [Graph][Ops] fp32 accumulation for matmul_f16 (268) by **xiaocenxiaocen** 5bf255ad
- adding support for torch.any (277) by **zhumakhan** 2c4c672e
- fix: handles race condition on parallel config directory creation (285) by **c-fteixeira** b465dd34
- [SCRIPTS] Adopt our scripts to use `mode` from `torch.compile` (274) by **vadiklyutiy** 0f825b38
- [Fix] Handling `getitem` special case (281) by **Bolin Sun** 564561ec
- [Operator] Added advanced tensor indexing (251) by **zhumakhan** 018ca2ce
- [Operator] Adding support to `repeat_interleave` and more (270) by **Bolin Sun** b52bc889
- [PERF] Increase accuracy of pick up the best candidate (269) by **vadiklyutiy** 3834643f
- [Operator] Registering `torch.Tensor.copy_` (259) by **Bolin Sun** af5c8933
- [OPTIONS] Use Attention by default (261) by **vadiklyutiy** 33ad85bd
- [Operator] Registering torch.sigmoid_ (258) by **Bolin Sun** c9fb801d
- [Operator] Adding support for `torch.Tensor.div` (249) by **Bolin Sun** c8d46638
- [Operator] Adding `torch.Tensor.expand_as` support (250) by **Bolin Sun** 923f0781
- [Operator] Adding support to operators `torch.Tensor.max` and `torch.Tensor.new_full` (238) by **Bolin Sun** c5912a4b
- Delete options `use_fp16` and `use_fp16_reduction` (239) by **vadiklyutiy** e7fe23b6
- Inherit `mode` argument from `torch.compile` and set corresponding options (237) by **vadiklyutiy** 91f666ea
- [Operators] Registering `torch.as_tensor` (235) by **Bolin Sun** 540367ba
- [Operator] Registering `torch.Tensor.argmax` (234) by **Bolin Sun** bdd7acde
- [Ir][CuTE] lower cute dialect (109) (230) by **xiaocenxiaocen** 783a5495
- Xiaocenxiaocen/expose more ldst instructions (216) by **xiaocenxiaocen** 8f03f9e3
- steal_weight option fixes && fixes for mistral model (209) by **zhumakhan** 9728c219
- Fix issues related to mistral model (213) by **zhumakhan** 68e801b7
- [BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (210) by **vadiklyutiy** 59028d8f
- [BUGFIX] Init cuda info before run forks for IR generation (208) by **vadiklyutiy** 30125463
- [Ir] add utilities for CuTe (107) by **xiaocenxiaocen** 423e1122
- [BUG] Clear `_job_queue` in `parallel_imap` for tests (204) by **vadiklyutiy** bf39bd64
- [OPTIONS] Don't create hidet config if it's not exist (203) by **vadiklyutiy** 294d2613
- feat: parallel job execution for tests (147) by **c-fteixeira** db588f99
- \_\_getitem\_\_ with N dimensional index tensor (185) by **zhumakhan** f46a184f
- [Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (187) by **Bolin Sun** 5fc4271e
- [Operator] Adding meshgrid operator support (183) by **Bolin Sun** d8158a9a
- [Bug] Fix number of groups under certain case (181) by **Max Hu** 8a6cbfdd
- [COMPTIME] Reduce the number of `fork` in `multithreading.Pool` (180) by **vadiklyutiy** 9e576dc2
- [COMPTIME] Add `chunksize` arg to `pool.imap` (178) by **vadiklyutiy** 7c50af6f
- optimize grouping method (174) by **Max Hu** 9b9a22bb
- [App] SyncLLM + AsyncLLM interface (166) by **Jack Lee** e51f0c00
- [Ir][Primitives] add hopper instructions (83) by **xiaocenxiaocen** 42252980
- [OPS] Add `torch.Tensor.sin`, `torch.Tensor.cos` and `torch._C._nn.pad` (175) by **vadiklyutiy** 90a6231a
- [App] ResNet Compiled App (2/2) - Pipeline (165) by **Kevin Tong** d308f8f8
- Revive dynamic shape support with `torch.compile` (162) by **vadiklyutiy** cf343ab2
- [Models] Gemma implementation (132) by **Jack Lee** 3a848202
- Support Transpose2D (77) by **zhiwei-fang** dd2e9d2e
- [App] Cleanup SD Implementation (143) by **Kevin Tong** 359763ef
- [Fixbug] Set _is_exiting correctly (163) by **Jack Lee** 1c8b31fa
- [App] Fix LLM app tracing (158) by **Jack Lee** f618977b
- [Operator] triu + tril operators (146) by **Jack Lee** 70894fa5
- Gemma+torch.compile fixes(autocast, rtruediv) (159) by **vadiklyutiy** 710ac501
- [IR] [Primitives] Add thread cluster on sm_90 (145) by **Kevin Tong** ccc28d65
- [App] Minor bugfixes for LLM app (157) by **Jack Lee** 179f0583
- [COMPTIME] Specialize `Constant._binary()` for compilation speedup (148) by **vadiklyutiy** 8a1eab4f
- [Operator] Fix symbolic broadcasting (131) by **Jack Lee** 12522203
- [Operator] Register missing math primitives (134) by **Jack Lee** 61b00523
- [Ir][Primitives] fix __shfl_xor_sync (155) by **xiaocenxiaocen** 37c75a6d
- [COMPTIME] Parallelize `apply_prologue_epilog`(fusion) and IR generation(`implement*`) (127) by **vadiklyutiy** 9e96c457
- [Graph] Enhance forward debug instrument (130) by **Jack Lee** 4267686b
- Stable Diffusion App Infra (103) by **Kevin Tong** 8f03f9e4
- [LLM App] LLM Application initial support (121) by **Yaoyao Ding** fc61f48d
- [Models] Support for tokenizers in C++ runtime (69) by **Jack Lee** c14de4e2
- [Graph] Add major UNet building components (97) by **Kevin Tong** 364ba9c3
- [CI] Add clang-format script/action (120) by **Jack Lee** cdff99af
- [Graph] Stable Diffusion Rope Module (95) by **Kevin Tong** 6fa58030
- [App] Complete UNet Definition (99) by **Kevin Tong** 805620e5
- [FFI] Refactor CompiledFunction interface with ctypes (79) by **Jack Lee** a8c9d945
- [STYLE] Format cpp/h files (454) by **vadiklyutiy** 1f1b011e
- [cuDNN] Add cudnn conv2d (453) by **vadiklyutiy** bc5a6df2
Contributors
* yaoyaoding
* xiaocenxiaocen
* vadiklyutiy
* maxyanghu
* BolinSNLHM
* zhumakhan
* c-fteixeira
* jacklee1792
* KTong821
* zhiwei-fang
**Full Changelog**: https://github.com/hidet-org/hidet/compare/v0.3.1...v0.4.0