What's Changed
* `SFTTrainer` is now available.
* `VideoCausalLanguageModelTrainer` is now available.
* New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
* MoE models had some speed improvements.
* Training Speed is now 18%~42% faster.
* Normal Attention is now faster by 12%~30% 131 .
* DPOTrainer Bugs Fixed.
* CausalLanguageModelTrainer is now more customizable.
* WANDB logging has improved.
* Performace Mode is added to Training Arguments.
* Model configs pass attributes to PretrainedConfig to prevent override… by yhavinga in https://github.com/erfanzar/EasyDeL/pull/122
* Ignore token label smooth z loss by yhavinga in https://github.com/erfanzar/EasyDeL/pull/123
* Time the whole train loop instead of only call to train step function by yhavinga in https://github.com/erfanzar/EasyDeL/pull/124
* Add save_total_limit argument to delete older checkpoints by yhavinga in https://github.com/erfanzar/EasyDeL/pull/127
* Add gradient norm logging, fix metric collection on multi-worker setup by yhavinga in https://github.com/erfanzar/EasyDeL/pull/135
**Full Changelog**: https://github.com/erfanzar/EasyDeL/compare/0.0.55...0.0.60