Changes
- Default to using `torch.cuda.device_count()` for `tp_size` in `batch_inference` tongyx361
- Improved description of `tqdm` tongyx361
- Fixed loading dataset from local text files tongyx361
- Added support for Llama3.1 xiaoxigua999
- Added `--packing_samples` support for all HF models (SFT/DPO/RM training) xiaoxigua999
- Added `--nll_loss_coef` (for chosen response) support for DPO xiaoxigua999