Notes
In this release, we add more support for large language model inference, distributed inference, and quantization. We also make hidet script more stable and added more documentation for it. More operators and models are supported. See below for more details.
Frontend
* [Frontend] Dynamic shape fx trace by Aalanli in https://github.com/hidet-org/hidet/pull/294
* [Torch] Steal Pytorch weights by hjjq in https://github.com/hidet-org/hidet/pull/310
* [Dynamo Frontend] Refactor the dynamic shape support by yaoyaoding in https://github.com/hidet-org/hidet/pull/319
* [Torch][Graph][Operator] Add and fix various items for torchvision model support by hjjq in https://github.com/hidet-org/hidet/pull/347
* [Dynamo] minor enhancements to attention and register a few functions by xinli-git in https://github.com/hidet-org/hidet/pull/345
Operators and models
* [Operator] Further performance enhancements for conv2D by Aalanli in https://github.com/hidet-org/hidet/pull/290
* [Operator] Refactoring matrix multiplication implementation by yaoyaoding in https://github.com/hidet-org/hidet/pull/296
* [Model Support] Add support for wav2vec by yaoyaoding in https://github.com/hidet-org/hidet/pull/303
* [Operator] Update attention for dynamic shape by hjjq in https://github.com/hidet-org/hidet/pull/307
* [Operator] Resolve Adaptive Pool to reduce by hjjq in https://github.com/hidet-org/hidet/pull/308
* [Reduce] optimize and unify reduce operator to a single place by xinli-git in https://github.com/hidet-org/hidet/pull/311
* [Operator] optimize normalize op with vectorized load, dynamic shape and more by xinli-git in https://github.com/hidet-org/hidet/pull/316
* [Model] Add missing operators for T5 by yaoyaoding in https://github.com/hidet-org/hidet/pull/322
* [Fixbug] Reduce should perform syncthread after initializing shared memory to zero by xinli-git in https://github.com/hidet-org/hidet/pull/325
* [Models] Llama 2 support by Aalanli in https://github.com/hidet-org/hidet/pull/324
* [Models] Llama2 fix by Aalanli in https://github.com/hidet-org/hidet/pull/333
* [Operator] Composite Elementwise Operation by hjjq in https://github.com/hidet-org/hidet/pull/337
* [Operator] Add clamp/isinf/any/all op, enhance where op by yaoyaoding in https://github.com/hidet-org/hidet/pull/343
* [Torch][Operator] More torchvision model support by hjjq in https://github.com/hidet-org/hidet/pull/348
* [Operator] Add einsum by hjjq in https://github.com/hidet-org/hidet/pull/349
* [Operator][Graph][Regression] CNN optimizations by hjjq in https://github.com/hidet-org/hidet/pull/356
* [Graph] Minor bug fixes by hjjq in https://github.com/hidet-org/hidet/pull/358
Distributed inference
* [Distributed] all_reduce op and distributed info in graphs by soodoshll in https://github.com/hidet-org/hidet/pull/284
* [Distributed] Add more runtime distributed communication functions by soodoshll in https://github.com/hidet-org/hidet/pull/314
* [Fixbug] group_start and group_end should be able importable without nccl by soodoshll in https://github.com/hidet-org/hidet/pull/317
Quantization
* [Operators] preliminary symmetric weight quantization by Aalanli in https://github.com/hidet-org/hidet/pull/298
* [Quantization] Quantization API by Aalanli in https://github.com/hidet-org/hidet/pull/309
* [Quantization] fix quantization pass bug by Aalanli in https://github.com/hidet-org/hidet/pull/355
IR and passes
* [FixBug] Don't instantiate symbol for primitive functions by hjjq in https://github.com/hidet-org/hidet/pull/291
* [Fix] NCCL API mismatch and NCCL primitive fix by soodoshll in https://github.com/hidet-org/hidet/pull/301
* [Fixbug] Prevent allreduce op from being fused by soodoshll in https://github.com/hidet-org/hidet/pull/304
* [Enhancements] add a vcude device to help mitigate compile time GPU memory usage by xinli-git in https://github.com/hidet-org/hidet/pull/302
* [Task] More descriptive kernel names for nsys/ncu by Aalanli in https://github.com/hidet-org/hidet/pull/315
* [Fixbug][Hidet Script] Fix a bug that hidet script does not recognize return type by yaoyaoding in https://github.com/hidet-org/hidet/pull/329
* [Hidet script] Add `hidet.lang.types` submodule by yaoyaoding in https://github.com/hidet-org/hidet/pull/340
* [IR][Parser] Hidet IR grammar, parser and ir reconstructor by Aalanli in https://github.com/hidet-org/hidet/pull/354
Runtime
* [Runtime] Check for input tensor device by hjjq in https://github.com/hidet-org/hidet/pull/287
* [Fixbug] Is exiting fix by xinli-git in https://github.com/hidet-org/hidet/pull/293
Backends
* [Fixbug] Fix the c++ standard to c++11 for both nvcc and gcc compilers by yaoyaoding in https://github.com/hidet-org/hidet/pull/327
* [CPU][Scheduler] Use mutli-threads for autl-scheduler by yaoyaoding in https://github.com/hidet-org/hidet/pull/341
Documentation
* [Document] fix installation guide by soodoshll in https://github.com/hidet-org/hidet/pull/288
* [Docs] Update the documentation for the coming release by yaoyaoding in https://github.com/hidet-org/hidet/pull/360
Others
* [Version] Bump version to 0.3.0.dev by yaoyaoding in https://github.com/hidet-org/hidet/pull/286
* [Tools] simple benchmarking utility by Aalanli in https://github.com/hidet-org/hidet/pull/292
* [Compile Server] Support remote compilation via compilation server by yaoyaoding in https://github.com/hidet-org/hidet/pull/297
* [Compile Server] Allow the user to specify the repo and branch/tag to use by yaoyaoding in https://github.com/hidet-org/hidet/pull/300
* [Compile Server] Add a new option to specify the cuda arch by yaoyaoding in https://github.com/hidet-org/hidet/pull/305
* [Fixbug] Fix a bug in compile server by yaoyaoding in https://github.com/hidet-org/hidet/pull/306
* [Graph] Minor graph benchmark fix by Aalanli in https://github.com/hidet-org/hidet/pull/313
* [Regression] Local performance regression by hjjq in https://github.com/hidet-org/hidet/pull/321
* [Regression] Increase benchmark iters and update perf data by hjjq in https://github.com/hidet-org/hidet/pull/328
* [CI] List package versions in ci by yaoyaoding in https://github.com/hidet-org/hidet/pull/334
* [Fixbug] Clear the intermediate object files for kernel tuning by yaoyaoding in https://github.com/hidet-org/hidet/pull/339
* [Config] Add configuration file by Aalanli in https://github.com/hidet-org/hidet/pull/359
**Full Changelog**: https://github.com/hidet-org/hidet/compare/v0.2.4...v0.3.0