We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:
* Revamped checkpointing API based on community feedback
* New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
* Additional improvements to our [documentation](https://docs.mosaicml.com/en/latest/)
* Support for `bfloat16`
* Streaming dataset support
* Unified functional API for our algorithms
Highlights
Checkpointing API
Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a `save_folder` is provided to the Trainer.
python
trainer = Trainer(
model=model,
algorithms=algorithms,
save_folder="checkpoints",
save_interval="1ep"
)
Alternatively, `CheckpointSaver` can be directly added as a callback:
python
trainer = Trainer(..., callbacks=[
CheckpointSaver(
save_folder='checkpoints',
name_format="ep{epoch}-ba{batch}/rank_{rank}",
save_latest_format="latest/rank_{rank}",
save_interval="1ep",
weights_only=False,
)
])
Subclass from `CheckpointSaver` to add your own logic for saving the best model, or saving at specific intervals. Thanks to mansheej siriuslee and other users for their feedback.
bloat16
We've added experimental support for `bfloat16`, which can be provided via the `precision` argument to the Trainer:
python
trainer = Trainer(
...,
precision="bfloat16"
)
Streaming datasets
We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See `C4Datasets` for more details.
Vision streaming datasets are supported via a patched version of the `webdatasets` package, and added support for data sharding by workers for fast augmentations. See `composer.datasets.webdataset` for more details.
Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks
Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.
* [GPT3-125m](https://github.com/mosaicml/composer/blob/v0.5.0/composer/yamls/models/gpt3_125m.yaml)
* [GPT3-350m](https://github.com/mosaicml/composer/blob/v0.5.0/composer/yamls/models/gpt3_350m.yaml)
* [GPT3-760m](https://github.com/mosaicml/composer/blob/v0.5.0/composer/yamls/models/gpt3_760m.yaml)
We've also added the Single Shot Detection (SSD) model ([Wei et al, 2016](https://arxiv.org/abs/1512.02325)) with a ResNet34 backbone, based on the MLPerf reference implementation.
Our first Vision Transformer benchmark is the ViT-S/16 model from [Touvron et al, 2021](https://arxiv.org/pdf/2012.12877.pdf), and based on the `vit-pytorch` package.
See below for the full details:
What's Changed
* Export Transforms in `composer.algorithms` by ajaysaini725 in https://github.com/mosaicml/composer/pull/603
* Make batchnorm default for UNet by dskhudia in https://github.com/mosaicml/composer/pull/535
* Fix no_op_model algorithm by dskhudia in https://github.com/mosaicml/composer/pull/614
* Pin pre-1.0 packages by bandish-shah in https://github.com/mosaicml/composer/pull/595
* Updated dark mode composer logo, and graph by nqn in https://github.com/mosaicml/composer/pull/617
* Jenkins + Docker Improvements by ravi-mosaicml in https://github.com/mosaicml/composer/pull/621
* update README links by hanlint in https://github.com/mosaicml/composer/pull/628
* Remove all old timing calls by ravi-mosaicml in https://github.com/mosaicml/composer/pull/594
* Remove state shorthand by mvpatel2000 in https://github.com/mosaicml/composer/pull/629
* add bfloat16 support by nikhilsardana in https://github.com/mosaicml/composer/pull/433
* v0.4.0 Hotfix: Docker documentation updates by bandish-shah in https://github.com/mosaicml/composer/pull/631
* Fix wrong icons in the method cards by hanlint in https://github.com/mosaicml/composer/pull/636
* fix autocast for pytorch < 1.10 by nikhilsardana in https://github.com/mosaicml/composer/pull/639
* Add tutorial notebooks to the README by moinnadeem in https://github.com/mosaicml/composer/pull/630
* Converted Stateless Schedulers to Classes by ravi-mosaicml in https://github.com/mosaicml/composer/pull/632
* Jenkinsfile Fixes Part 2 by ravi-mosaicml in https://github.com/mosaicml/composer/pull/627
* Add C4 Streaming dataset by abhi-mosaic in https://github.com/mosaicml/composer/pull/489
* CONTRIBUTING.md additions by kobindra in https://github.com/mosaicml/composer/pull/648
* Hide showing `object` as a base class; fix skipping documentation of `forward`; fixed docutils dependency. by ravi-mosaicml in https://github.com/mosaicml/composer/pull/643
* Matthew/functional docstrings update by growlix in https://github.com/mosaicml/composer/pull/622
* docstrings improvements for core modules by dskhudia in https://github.com/mosaicml/composer/pull/598
* ssd-resnet34 on COCO map 0.23 by florescl in https://github.com/mosaicml/composer/pull/646
* Fix broken "best practices" link by growlix in https://github.com/mosaicml/composer/pull/649
* Update progressive resizing to work for semantic segmentation by coryMosaicML in https://github.com/mosaicml/composer/pull/604
* Let C4 Dataset overwrite `num_workers` if set incorrectly by abhi-mosaic in https://github.com/mosaicml/composer/pull/655
* Lazy imports for `pycocotools` by abhi-mosaic in https://github.com/mosaicml/composer/pull/656
* W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by growlix in https://github.com/mosaicml/composer/pull/633
* Update GPT3-yamls for default 8xA100-40GB by abhi-mosaic in https://github.com/mosaicml/composer/pull/663
* Set WandB default to log rank zero only by abhi-mosaic in https://github.com/mosaicml/composer/pull/461
* Update schedulers guide by hanlint in https://github.com/mosaicml/composer/pull/661
* [XS] Fix a TQDM deserialization bug by jbloxham in https://github.com/mosaicml/composer/pull/665
* Add defaults to the docstrings for algorithms by hanlint in https://github.com/mosaicml/composer/pull/662
* Fix ZeRO config by jbloxham in https://github.com/mosaicml/composer/pull/667
* [XS] fix formatting for colout by hanlint in https://github.com/mosaicml/composer/pull/666
* Composer.core docstring touch-up by ravi-mosaicml in https://github.com/mosaicml/composer/pull/657
* Add Uniform bounding box sampling option for CutOut and CutMix by coryMosaicML in https://github.com/mosaicml/composer/pull/634
* Update README.md by ravi-mosaicml in https://github.com/mosaicml/composer/pull/678
* Fix bug in trainer test by hanlint in https://github.com/mosaicml/composer/pull/651
* InMemoryLogger has get_timeseries() method by growlix in https://github.com/mosaicml/composer/pull/644
* Batchwise resolution for SWA by growlix in https://github.com/mosaicml/composer/pull/654
* Fixed the conda build script so it runs on jenkins by ravi-mosaicml in https://github.com/mosaicml/composer/pull/676
* Yahp version update to 0.1.0 by Averylamp in https://github.com/mosaicml/composer/pull/674
* Streaming vision datasets by knighton in https://github.com/mosaicml/composer/pull/284
* Fix DeepSpeed checkpointing by jbloxham in https://github.com/mosaicml/composer/pull/686
* Vit by A-Jacobson in https://github.com/mosaicml/composer/pull/243
* [S] cleanup tldr; standardize `__all__` by hanlint in https://github.com/mosaicml/composer/pull/688
* Unify algorithms part 2: mixup, cutmix, label smoothing by dblalock in https://github.com/mosaicml/composer/pull/658
* `composer.optim` docstrings by jbloxham in https://github.com/mosaicml/composer/pull/653
* Fix DatasetHparams, WebDatasetHparams docstring by growlix in https://github.com/mosaicml/composer/pull/697
* Models docstrings by A-Jacobson in https://github.com/mosaicml/composer/pull/469
* docstrings improvements for composer.datasets by dskhudia in https://github.com/mosaicml/composer/pull/694
* Updated contributing.md and the style guide by ravi-mosaicml in https://github.com/mosaicml/composer/pull/670
* Ability to retry ADE20k crop transform by Landanjs in https://github.com/mosaicml/composer/pull/702
* Add mmsegmentation DeepLabv3(+) by Landanjs in https://github.com/mosaicml/composer/pull/684
* Unify functional API part 3 by dblalock in https://github.com/mosaicml/composer/pull/715
* Update example notebooks by coryMosaicML in https://github.com/mosaicml/composer/pull/707
* [Checkpointing - PR1] Store the `rank_zero_seed` on state by ravi-mosaicml in https://github.com/mosaicml/composer/pull/680
* [Checkpointing - PR2] Added in new Checkpointing Events by ravi-mosaicml in https://github.com/mosaicml/composer/pull/690
* [Checkpointing - PR3] Clean up RNG and State serialization by ravi-mosaicml in https://github.com/mosaicml/composer/pull/692
* [Checkpointing - PR4] Refactored the `CheckpointLoader` into a `load_checkpoint` function by ravi-mosaicml in https://github.com/mosaicml/composer/pull/693
* Update {blurpool,factorize,ghostbn} method cards by dblalock in https://github.com/mosaicml/composer/pull/711
* [Checkpointing - PR 5] Move the `CheckpointSaver` to a callback. by ravi-mosaicml in https://github.com/mosaicml/composer/pull/687
* Update datasets docstrings by growlix in https://github.com/mosaicml/composer/pull/709
* add notebooks and functional api by hanlint in https://github.com/mosaicml/composer/pull/714
* Migrating from PTL notebook by florescl in https://github.com/mosaicml/composer/pull/436
* Docs 0.4.1: Profiler section and tutorials by bandish-shah in https://github.com/mosaicml/composer/pull/696
* Improve datasets docstrings by knighton in https://github.com/mosaicml/composer/pull/695
* Update `C4Dataset` to repeat, handle `max_samples` safely by abhi-mosaic in https://github.com/mosaicml/composer/pull/722
* Fix docs build by ravi-mosaicml in https://github.com/mosaicml/composer/pull/773