New Features
1. **🆕 PyTorch 2.0 Support (2172)**
We're thrilled to announce official support for PyTorch 2.0! We've got all initial unit tests passing and run through our [examples](https://github.com/mosaicml/examples). We've also made some updates to start taking advantage of all the great new features.
Initial support also includes:
* Support for [torch.compile](https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever)
| Model | Dataset | Without compile thoughput/samples_per_sec | With compile thoughput/samples_per_sec | Performance % |
| ------------ | -------- | ----------------------------------------- | -------------------------------------- | ------------- |
| ResNet50 | ImageNet | 5557 | 7424 | 33.60% |
| DeepLab V3 | ADE20K | 81.60 | 98.82 | 21.10% |
| HF BERT | C4 | 3360 | 4259 | 26.75% |
| HF Causal LM | C4 | 50.61 | 103.29 | 100.05% |
To start using, simply add `compile_config` argument to the `Trainer`:
python
To use default `torch.compile` config
trainer = Trainer(
...,
compile_config={},
)
To use custom `torch.compile` config, provide an argument as a dictionary, for example:
trainer = Trainer(
...,
compile_config={'mode': 'reduce-overhead'},
)
The `Trainer` also supports pre-compiled models passed via the `models` argument. If the model has been pre-compiled, the `compile_config` argument is ignored if provided.
**Note**: We recommend baselining your model with and without `torch.compile` as there are scenarios where enabling compile does not yield any throughput improvements and in some cases where this can lead to a regression.
* PyTorch 2.0 Docker Images
We've added the following new official [MosaicML Docker Images](https://hub.docker.com/u/mosaicml) with PyTorch 2.0 support:
| Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
|----------------|----------|-------------------|---------------------|------------------|---------------------------------------------------------------------------------------------------|
| Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | `mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (EFA) | 3.10 | `mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.0.0 | cpu | 3.10 | `mosaicml/pytorch:2.0.0_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | `mosaicml/pytorch_vision:2.0.0_cu117-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 2.0.0 | cpu | 3.10 | `mosaicml/pytorch_vision:2.0.0_cpu-python3.10-ubuntu20.04` |
1. **🦾 New Callbacks**
* Activation monitor (2066)
Monitors activations in the network. Every interval batches it will attach a forwards hook and logs the max, average, l2 norm, and kurtosis for the input and output activations. To enable:
python
from composer import Trainer
from composer.callbacks import ActivationMonitor
Construct Trainer
trainer = Trainer(
...,
callbacks=[ActivationMonitor()],
)
Train!
trainer.fit()
* Slack Logger (2133)
You can now send custom training metrics using Slack! To enable:
python
from composer import Trainer
from composer.loggers import SlackLogger
transform = transforms.Compose([transforms.ToTensor()])
trainer = Trainer(
...
loggers=[
SlackLogger(
log_interval="10ba", or 1ep, 2ep
include_keys=["algorithm_traces*", "loss*"],
formatter_func=(lambda data, **kwargs:
[
{
"type": "section", "text": {"type": "mrkdwn", "text": f"*{k}:* {v}"}
}
for k, v in data.items()
])
)
],
)
trainer.fit()
Please see PR 2133 for additional details.
API changes
* The `grad_accum` argument has been removed from `Trainer`, users are now required to use `device_train_microbatch_size` instead (2040)
Deprecations
* We no longer support PyTorch 1.11 and 1.12 due to security vulnerabilities. New features will not be tested against these versions.
Bug Fixes
* Eval subset num batches bug fix (2028)
* Protect for missing slack_sdk import (2031)
* Adjust HuggingFaceModel token embedding resizing to only occur when necessary (2027)
* Update FSDP meta weight tying tests to include precision testing (2050)
* Backward Compat with Torchmetrics (2046)
* Busy wait for local rank 0 download to avoid timeout on large file download (2054)
* Fix OCIObjectStore save_overwrite=False bug (2053)
* Busy wait so that non local rank zeros don't timeout while local rank zero downloads a monolithic checkpoint (2071)
* Skip extra downloads when not using a format string (2073)
* fix name_or_path usage in HF save/load usage (2075)
* Fix EMA resumption issue with calling trainer.eval() before trainer.fit() (2088)
* Patch EMA with FSDP (2091)
* Updating gradient clipping to be torch 2.0 compatible (2089)
* Adding checks for weight tying s.t. we don't think None attributes are weight tied (2103)
* gate the extra forward call specifically for fsdp (2102)
* Allow user to set ONNX opset version when Exporting for Inference (2101)
* Runtime estimator (2124)
* Use state_dict Torchmetrics Serialization (2116)
* Fix filelock in checkpoint download (2184)
What's Changed
* Eval subset num batches bug fix by mvpatel2000 in https://github.com/mosaicml/composer/pull/2028
* Protect for missing `slack_sdk` import by hanlint in https://github.com/mosaicml/composer/pull/2031
* switch code quality workflow to dev target and smoketest by dakinggg in https://github.com/mosaicml/composer/pull/2032
* Generate composer PyPi package by bandish-shah in https://github.com/mosaicml/composer/pull/2034
* HealthChecker should only send test message on global rank zero by hanlint in https://github.com/mosaicml/composer/pull/2035
* Bump version to 0.13.1 by bandish-shah in https://github.com/mosaicml/composer/pull/2033
* Use follow in mcp script by mvpatel2000 in https://github.com/mosaicml/composer/pull/2022
* Bump pytest from 7.2.1 to 7.2.2 by dependabot in https://github.com/mosaicml/composer/pull/2039
* Bump pypandoc from 1.10 to 1.11 by dependabot in https://github.com/mosaicml/composer/pull/2038
* Adds a PR guidelines section to contributing.md by dakinggg in https://github.com/mosaicml/composer/pull/1993
* Adjust HuggingFaceModel token embedding resizing to only occur when necessary by dakinggg in https://github.com/mosaicml/composer/pull/2027
* Remove deprecated code by mvpatel2000 in https://github.com/mosaicml/composer/pull/2026
* test and fix composer package name usage in composer_collect_env by dakinggg in https://github.com/mosaicml/composer/pull/2049
* Log nodename information in composer by eracah in https://github.com/mosaicml/composer/pull/2043
* Update FSDP meta weight tying tests to include precision testing by bcui19 in https://github.com/mosaicml/composer/pull/2050
* Backward Compat with Torchmetrics by mvpatel2000 in https://github.com/mosaicml/composer/pull/2046
* update fsdp mixed precision by vchiley in https://github.com/mosaicml/composer/pull/2047
* Checkpoints Simplified by mvpatel2000 in https://github.com/mosaicml/composer/pull/2041
* Add composer PyPI package tests to daily workflow by bandish-shah in https://github.com/mosaicml/composer/pull/2052
* Delete composer package GPU workflow by dakinggg in https://github.com/mosaicml/composer/pull/2055
* Revert "Checkpoints Simplified (2041)" by dakinggg in https://github.com/mosaicml/composer/pull/2056
* Raise error if attempting to export FSDP model by hanlint in https://github.com/mosaicml/composer/pull/2051
* Busy wait for local rank 0 download to avoid timeout on large file download by dakinggg in https://github.com/mosaicml/composer/pull/2054
* Fix OCIObjectStore save_overwrite=False bug by eracah in https://github.com/mosaicml/composer/pull/2053
* Update docs with non-rank zero logs instructions by hanlint in https://github.com/mosaicml/composer/pull/2058
* Pin torchmetrics by mvpatel2000 in https://github.com/mosaicml/composer/pull/2065
* Add `NO_REENTRANT` activation checkpointing by bmosaicml in https://github.com/mosaicml/composer/pull/2042
* Allow `LPLayerNorm` and `LPGroupNorm` to support `self.bias` or `self.weight` = None by abhi-mosaic in https://github.com/mosaicml/composer/pull/2044
* Checkpoints Simplified by mvpatel2000 in https://github.com/mosaicml/composer/pull/2059
* Add `device` and `dtype` back to `LPLayerNorm` by abhi-mosaic in https://github.com/mosaicml/composer/pull/2067
* Revert "Checkpoints Simplified (2059)" by dakinggg in https://github.com/mosaicml/composer/pull/2070
* Busy wait so that non local rank zeros don't timeout while local rank zero downloads a monolithic checkpoint by dakinggg in https://github.com/mosaicml/composer/pull/2071
* Add support + test for autoresume with FSDP sharded checkpoints by dakinggg in https://github.com/mosaicml/composer/pull/2072
* Skip extra downloads when not using a format string by dakinggg in https://github.com/mosaicml/composer/pull/2073
* Bump version to v0.13.2 by bandish-shah in https://github.com/mosaicml/composer/pull/2068
* Pin transformers package to <4.27 by dakinggg in https://github.com/mosaicml/composer/pull/2076
* Bump coverage[toml] from 7.2.1 to 7.2.2 by dependabot in https://github.com/mosaicml/composer/pull/2082
* Update datasets CODEOWNERS by dakinggg in https://github.com/mosaicml/composer/pull/2084
* fix name_or_path usage in HF save/load usage by dakinggg in https://github.com/mosaicml/composer/pull/2075
* Remove grad accum by mvpatel2000 in https://github.com/mosaicml/composer/pull/2040
* Add support for ICL QA tasks and generation during evaluation with `HuggingFaceModel` by dakinggg in https://github.com/mosaicml/composer/pull/2045
* make composer fsdp work with latest torch by dskhudia in https://github.com/mosaicml/composer/pull/2078
* Fix EMA resumption issue with calling trainer.eval() before trainer.fit() by coryMosaicML in https://github.com/mosaicml/composer/pull/2088
* Disable wrapping for fsdp if specified by mvpatel2000 in https://github.com/mosaicml/composer/pull/2086
* skip fsdp tests for <1.13 by dakinggg in https://github.com/mosaicml/composer/pull/2090
* Patch EMA with FSDP by mvpatel2000 in https://github.com/mosaicml/composer/pull/2091
* Update Wandb docs with incorrect default by mvpatel2000 in https://github.com/mosaicml/composer/pull/2092
* Fix typo by nik-mosaic in https://github.com/mosaicml/composer/pull/2098
* Replace broken explorer link by nik-mosaic in https://github.com/mosaicml/composer/pull/2099
* Updating gradient clipping to be torch 2.0 compatible by bcui19 in https://github.com/mosaicml/composer/pull/2089
* Adding checks for weight tying s.t. we don't think `None` attributes are weight tied by bcui19 in https://github.com/mosaicml/composer/pull/2103
* gate the extra forward call specifically for fsdp by dakinggg in https://github.com/mosaicml/composer/pull/2102
* Allow user to set ONNX opset version when Exporting for Inference by nik-mosaic in https://github.com/mosaicml/composer/pull/2101
* Seed the fewshot sampling in the ICL datasets by dakinggg in https://github.com/mosaicml/composer/pull/2100
* pin mcp by dakinggg in https://github.com/mosaicml/composer/pull/2111
* adjust decoding for eval forward by dakinggg in https://github.com/mosaicml/composer/pull/2107
* Add sentencepiece support to `HuggingFaceModel` by dakinggg in https://github.com/mosaicml/composer/pull/2093
* Bump yamllint from 1.28.0 to 1.30.0 by dependabot in https://github.com/mosaicml/composer/pull/2094
* update transformers to latest version by dakinggg in https://github.com/mosaicml/composer/pull/2109
* Bump version to 0.13.3 by bandish-shah in https://github.com/mosaicml/composer/pull/2115
* update numpy by dakinggg in https://github.com/mosaicml/composer/pull/2108
* Update Export NLP tests by nik-mosaic in https://github.com/mosaicml/composer/pull/1904
* Activation monitor by bcui19 in https://github.com/mosaicml/composer/pull/2066
* Relax streaming package version check to major version by karan6181 in https://github.com/mosaicml/composer/pull/2119
* Bump to 13.4 by mvpatel2000 in https://github.com/mosaicml/composer/pull/2121
* Auto Microbatching -- The Final Form by mvpatel2000 in https://github.com/mosaicml/composer/pull/2117
* add logic for direct instantiation by dakinggg in https://github.com/mosaicml/composer/pull/2122
* Runtime estimator by mvpatel2000 in https://github.com/mosaicml/composer/pull/2124
* Fix early stopper docs links by mvpatel2000 in https://github.com/mosaicml/composer/pull/2126
* Removes MCLI pin by mvpatel2000 in https://github.com/mosaicml/composer/pull/2127
* Bump pytest from 7.2.2 to 7.3.0 by dependabot in https://github.com/mosaicml/composer/pull/2128
* Bump nbsphinx from 0.8.12 to 0.9.1 by dependabot in https://github.com/mosaicml/composer/pull/2129
* Bump ipykernel from 6.20.1 to 6.22.0 by dependabot in https://github.com/mosaicml/composer/pull/2130
* Add batch log interval to optimizer monitor by dakinggg in https://github.com/mosaicml/composer/pull/2132
* Flush checkpoint on kill by mvpatel2000 in https://github.com/mosaicml/composer/pull/2125
* Bump deepspeed from 0.7.7 to 0.8.3 by dependabot in https://github.com/mosaicml/composer/pull/2131
* Add flexibility for FSDP Auto Wrap in Composer by bcui19 in https://github.com/mosaicml/composer/pull/2134
* Mcloud logger dest by mvpatel2000 in https://github.com/mosaicml/composer/pull/2136
* Better defaults for `get_num_tokens_in_batch` by dakinggg in https://github.com/mosaicml/composer/pull/2139
* Adding sharded grad scaler by bcui19 in https://github.com/mosaicml/composer/pull/2138
* Bump pytest from 7.3.0 to 7.3.1 by dependabot in https://github.com/mosaicml/composer/pull/2144
* Make sure the timestamps of the checkpoints are the same when loading by eracah in https://github.com/mosaicml/composer/pull/2146
* Add torch.compile support for torch 2.0 by karan6181 in https://github.com/mosaicml/composer/pull/2118
* Fix broken URLs due to docs site refactor by bandish-shah in https://github.com/mosaicml/composer/pull/2150
* Ece icl by bmosaicml in https://github.com/mosaicml/composer/pull/2135
* Update wandb requirement from <0.14,>=0.13.2 to >=0.13.2,<0.15 by dependabot in https://github.com/mosaicml/composer/pull/2097
* Add support for `eval_interval` and `save_interval` in tokens by dakinggg in https://github.com/mosaicml/composer/pull/2149
* Upgrade to transformers 4.28 by dakinggg in https://github.com/mosaicml/composer/pull/2152
* Add PyTorch 2.0.0 image, deprecate PyTorch 1.10 and 1.11 images by bandish-shah in https://github.com/mosaicml/composer/pull/2077
* Log Time Attrs by mvpatel2000 in https://github.com/mosaicml/composer/pull/2155
* EMA + FSDP support by mvpatel2000 in https://github.com/mosaicml/composer/pull/2157
* Mvpatel2000/ema fix final by mvpatel2000 in https://github.com/mosaicml/composer/pull/2158
* Bump sphinx-copybutton from 0.5.0 to 0.5.2 by dependabot in https://github.com/mosaicml/composer/pull/2159
* Bump junitparser from 2.8.0 to 3.0.0 by dependabot in https://github.com/mosaicml/composer/pull/2160
* Update wandb requirement from <0.15,>=0.13.2 to >=0.13.2,<0.16 by dependabot in https://github.com/mosaicml/composer/pull/2161
* Bump yamllint from 1.30.0 to 1.31.0 by dependabot in https://github.com/mosaicml/composer/pull/2163
* Bump sphinxext-opengraph from 0.7.4 to 0.8.2 by dependabot in https://github.com/mosaicml/composer/pull/2162
* Bump version to v0.13.5 by mvpatel2000 in https://github.com/mosaicml/composer/pull/2166
* Icl subcategories by bmosaicml in https://github.com/mosaicml/composer/pull/2145
* Add SlackLogger w/ custom formatting to composer/logger by waiwuc in https://github.com/mosaicml/composer/pull/2133
* Use state_dict Torchmetrics Serialization by nik-mosaic in https://github.com/mosaicml/composer/pull/2116
* Adding in deprecation warning for min_params by bcui19 in https://github.com/mosaicml/composer/pull/2167
* Update auto microbatching warning by mvpatel2000 in https://github.com/mosaicml/composer/pull/2123
* Add support for torch 2.0 by dakinggg in https://github.com/mosaicml/composer/pull/2172
* Fix the daily tests by dakinggg in https://github.com/mosaicml/composer/pull/2173
* Fix remote path in daily test by dakinggg in https://github.com/mosaicml/composer/pull/2177
* Template icl by bmosaicml in https://github.com/mosaicml/composer/pull/2137
* Fix ICL eval for sentencepiece tokenizers by dakinggg in https://github.com/mosaicml/composer/pull/2178
* bump flash attentionv ersion by dakinggg in https://github.com/mosaicml/composer/pull/2180
* Another attempt to fix the daily tests by dakinggg in https://github.com/mosaicml/composer/pull/2181
* Skip backward compatible checkpointing test on older torch versions by dakinggg in https://github.com/mosaicml/composer/pull/2182
* Fix space continuation issue for few shot ICL by dakinggg in https://github.com/mosaicml/composer/pull/2183
* Bump coverage[toml] from 7.2.2 to 7.2.5 by dependabot in https://github.com/mosaicml/composer/pull/2188
* Bump sentencepiece from 0.1.97 to 0.1.98 by dependabot in https://github.com/mosaicml/composer/pull/2186
* Fix filelock in checkpoint download by mvpatel2000 in https://github.com/mosaicml/composer/pull/2184
* Update warning->info for number of tokens by mvpatel2000 in https://github.com/mosaicml/composer/pull/2192
* Bump version to 0.14.0 by bandish-shah in https://github.com/mosaicml/composer/pull/2190
New Contributors
* waiwuc made their first contribution in https://github.com/mosaicml/composer/pull/2133
**Full Changelog**: https://github.com/mosaicml/composer/compare/v0.13.5...v0.14.0