1. add xfomers memory efficient attention. 2. pytorch 2.0 auto fast attention, attention_fn dispatch via version. 3. add llama and chatglm2. 4. add split model for model-parallel in inference mode. 5. add r2 download
0.3.7
update vit add qlora/lora2
0.3.6
support no deepspeed model-only test cpu inference test windows
0.3.5
1. add repetition penalty 2. add quantization
0.3.4
1. add example for nested transformer models 2. move all print to logging, set `SAT_LOGLEVEL` to control