* VLLM support via patching - GemLite backend + on-the-fly quantization
* Add support for Aria
* Add support to load quantized SequenceClassification
* Faster decoding via (custom cudagraphs, sdpa math backend, etc.)
* Fix bugs related torch compile and hf_generator related to the newer transformers versions
* Fix bugs related to saving quantized models with no grouping
* Fix bugs related to saving large quantized models
* Update examples
* Add support for HQQLinear `.to(device)`