This patch mainly fixes 2983
In commit 9bec3c98a22c91b1c28fda757db51eb780291641, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the `create_optimizer_and_scheduler` method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b639899e72bcabc51d59bac8967af19899 and 8c77b1091296e204dc3c8c1f157c288ca5b236bd. Thank HideLord for helping us identify this critical bug.
[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881
We have also fixed 2961 2981 2982 2983 2991 3010