Added - Add Ampere support. faster fp16, faster tf32 and greatly faster int8 kernels in Ampere GPUs. * Add nvrtc support for conv kernel. Removed - drop python 3.6 support. Changed * BREAKING CHANGE: change dtype enum value for some important reason.
0.2.8
Fixed * Fix missing sm37 in supported arch
0.2.7
Added * add sm37 for cu102. * add compile info (cuda arch) for better error information.
0.2.6
Fixed * Fix a small bug that incorrectly limit arch of simt to sm52.
0.2.4
Added * add cpu support for CUDAKernelTimer. * add non-contiguous support for tv::Tensor. * add tsl hash map, refine cuda hash impl. Changed * raise error instead of exit program when cuda error occurs. * gemm kernel now use stride, this enable us perform gemm with non-contiguous tensor Fixed * Fix bugs for gemm kernel when use non-contiguous operand.