Gemlite

Latest version: v0.4.4

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

0.3.0

* New GEMV RevSplitK algorithm outperforms GEMM Split-K and GEMV for batch-size=1
* Add support for channel-wise scaling (weights, activations, weights + activations)
* Add support for FP8 x FP8 / FP8 x Wn
* Add support for INT8 x Wn
* Improved autotune speed
* Improved base configs for 4090 RTX, A100 and H100
* Better control for autotune via `set_autotune`

0.2.1

- Adds GEMM Split-K Support
- `torch.compile()` support
- Tunable A loading order, eviction policies and atomic add mode
- Overall performance improvement

0.1.0

Triton Kernels
- A16W8 (GEMV + GEMM) - with grouping
- A16W4 (GEMV + GEMM) - with grouping
- A16W2 (GEMV + GEMM) - with grouping
- A16W1 (GEMV + GEMM) - with grouping

CUDA Kernels
- A16W8 (GEMV - batch-size=1) - no grouping
- A16W4 (GEMV - batch-size=1) - no grouping
- A16W2 (GEMV - batch-size=1) - no grouping
- A8W8 (GEMV - batch-size=1) - no grouping
- A8W4 (GEMV - batch-size=1) - no grouping
- A8W2 (GEMV - batch-size=1) - no grouping

Page 2 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.