* Add gradient checkpointing, FullyShardedDataParallel
* Model releases
* (CLIP ViT-L-14 / MPT-1B)
* (CLIP ViT-L-14 / MPT-1B Dolly)
* (CLIP ViT-L-14 / RedPajama-3B)
* (CLIP ViT-L-14 / RedPajama-3B Instruct)
* (CLIP ViT-L-14 / MPT-7B)
* Remove color jitter when training
* Fix cross-attention bug when calling generate()