This release introduces support for iterating through a `DataLoader` only on the main process, that then dispatches the batches to all processes.
Dispatch batches from main DataLoader
The motivation behind this come from dataset streaming which introduces two difficulties:
- there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
- when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.
This new feature is activated by default for all `IterableDataset`.
- Central dataloader 164 (sgugger)
- Dynamic default for `dispatch_batches` 168 (sgugger)
Various fixes
- fix fp16 covert back to fp32 for issue: unsupported operand type(s) for /: 'dict' and 'int' 149 (Doragd)
- [Docs] Machine config is yaml not json 151 (patrickvonplaten)
- Fix gather for 0d tensor 152 (sgugger)
- [DeepSpeed] allow untested optimizers deepspeed 150 (patrickvonplaten)
- Raise errors instead of warnings with better tests 170 (sgugger)