Gemlite

Latest version: v0.4.4

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.4.4

* Add bfloat16 support to gemlite kernels by mobicham in https://github.com/mobiusml/gemlite/pull/24

0.4.3

- Add faster packing / unpacking utils
- Set MIN_SIZE = 64 for Gemma 3
- Update caches

0.4.2.post1

- Avoid recompilation when the batch-size `M` changes: https://github.com/mobiusml/gemlite/commit/dcc2455d6187ec58338d1746e4c780f0718f70c3
- Expose autotune `M` logic via `set_autotune_setting()`: https://github.com/mobiusml/gemlite/commit/37dab275d95bbfa67aa7dac718b123dbfad054a4
- Fix bug related to config caching that was ignoring the pre-loaded cache: https://github.com/mobiusml/gemlite/commit/3c4ab53c54b55e00f94223a5eadedfcee1815f1f

0.4.2

* Auto-load pre-warmed caches for A100, H100, 4090, A6000 Ada.
* Auto-set FP16 acc dtype for consumer gpus.
* Enable/disable cache overwriting while loading.
* Fix splitK_gemv bug with large block-sizes (Flux)
* Force M powers of 2 to avoid re-compilation during the prefill phase

0.4.1

Fix bugs related to config caching.

0.4.0

* Improved performance on the A100 and H100.
* Flexible bitpacking support (32-bit / 8-bit, over cols or rows).
* Best config caching over all kernels.
* Helper functions for easier usage.
* `GEMV_SPLITK` kernel for better performance at batch-size=1 with non-packed data.
* Improved accuracy via dumping for 8-bit weights with GEMV kernels.
* Max-autotuning.
* Avoid out-of-shared-memory by limiting `num_stages` based on the GPU device.
* Various bug fixes.

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.