t3 implemented; reverse conversion from gguf to safetensors (optional; need torch to work)
1.1.1
phi4 (trained by microsoft) included in lazy list for dry running
1.1.0
cutter (from bf/f16 to q2-q8 quantization) implemented; get the cutter by: ggc u
1.0.9
t2 implemented; customized model; without boundaries (optional; need torch)
1.0.8
quant2 decoder for torch tensors implemented (optional: need torch to work)
1.0.7
safetensors quantizor (able to quantize bf16 to fp8; around 50% decreased in file size) implemented; make llama-cpp-python (llama_cpp engine) optional (same as the bulky package torch) for wider coverage on audience base; easier for someone without c/c++ compiler can still benefit by most of the generic functions