You can leverage the [Habana](https://pytorch-lightning.readthedocs.io/en/stable/accelerators/hpu.html) hardware to accelerate your Deep Learning training workloads simply by passing:
python
trainer = pl.Trainer(accelerator="hpu")
single Gaudi training
trainer = pl.Trainer(accelerator="hpu", devices=1)
distributed training with 8 Gaudi
trainer = pl.Trainer(accelerator="hpu", devices=8)
The Bagua Strategy
The [Bagua Strategy](https://pytorch-lightning.readthedocs.io/en/stable/accelerators/gpu.html#bagua) is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling [Bagua](https://github.com/BaguaSys/bagua), which can be considerably faster than vanilla PyTorch DDP, is as simple as:
python
trainer = pl.Trainer(strategy="bagua")
or to choose a custom algorithm
trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce") default
Towards stable Accelerator, Strategy, and Plugin APIs
The `Accelerator`, `Strategy`, and `Plugin` APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.
In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (`TrainingTypePlugin`) as well as certain Plugins. In particular, we want to highlight the following changes:
- All `TrainingTypePlugin`s have been renamed to `Strategy` ([11120](https://github.com/PyTorchLightning/pytorch-lightning/pull/11120)). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new [`strategy` and `devices` flags to the Trainer](https://github.com/PyTorchLightning/pytorch-lightning/releases/tag/1.5.0#strategy-and-devices).
python
Before
from pytorch_lightning.plugins import DDPPlugin
New
from pytorch_lightning.strategies import DDPStrategy
- The `Accelerator` and `PrecisionPlugin` have moved into `Strategy`. All strategies now take an optional parameter `accelerator` and `precision_plugin` ([11022](https://github.com/PyTorchLightning/pytorch-lightning/pull/11022), [#10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570)).
- Custom Accelerator implementations must now implement two new abstract methods: `is_available()` ([11797](https://github.com/PyTorchLightning/pytorch-lightning/pull/11797)) and `auto_device_count()` ([#10222](https://github.com/PyTorchLightning/pytorch-lightning/pull/10222)). The latter determines how many devices get used by default when specifying `Trainer(accelerator=..., devices="auto")`.
- We redesigned the process creation for spawn-based strategies such as `DDPSpawnStrategy` and `TPUSpawnStrategy` ([10896](https://github.com/PyTorchLightning/pytorch-lightning/pull/10896)). All spawn-based strategies now spawn processes immediately upon calling `Trainer.{fit,validate,test,predict}`, which means the hooks/callbacks `prepare_data`, `setup`, `configure_sharded_model` and `teardown` all run under an initialized process group. These changes align the spawn-based strategies with their non-spawn counterparts (such as `DDPStrategy`).
We've also exposed the process group backend for use. For example, you can now easily enable [`fairring`](https://github.com/facebookresearch/fairring) like this:
python
Explicitly specify the process group backend if you choose to
ddp = pl.strategies.DDPStrategy(process_group_backend="fairring")
trainer = Trainer(strategy=ddp, accelerator="gpu", devices=8)
In a similar fashion, if installing `torch>=1.11`, you can enable [DDP static graph](https://pytorch.org/blog/pytorch-1.11-released/#stable-ddp-static-graph) to apply special runtime optimizations:
python
trainer = Trainer(devices=4, strategy=DDPStrategy(static_graph=True))
`LightningCLI` improvements
In the previous release, we added shorthand notation support for registered components. In this release, we added a flag to [automatically register](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_cli.html#subclass-registration) all available components:
python
from pytorch_lightning.utilities.cli import LightningCLI
LightningCLI(auto_registry=True)
We have also added support for the `ReduceLROnPlateau` scheduler with shorthand notation:
bash
$ python script.py fit --optimizer=Adam --lr_scheduler=ReduceLROnPlateau --lr_scheduler.monitor=metric_to_track
If you need to [customize the learning rate scheduler configuration](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_cli.html#optimizers-and-learning-rate-schedulers), you can do so by overriding:
python
class MyLightningCLI(LightningCLI):
staticmethod
def configure_optimizers(lightning_module, optimizer, lr_scheduler=None):
return {"optimizer": optimizer, "lr_scheduler": {"scheduler": lr_scheduler, ...}}
Finally, loggers are also now configurable with shorthand:
bash=
$ python script.py fit --trainer.logger=WandbLogger --trainer.logger.name="my_lightning_run"
Control SLURM's re-queueing
We've added the ability to turn the [automatic resubmission](https://pytorch-lightning.readthedocs.io/en/stable/clouds/cluster.html#wall-time-auto-resubmit) on or off when a job gets interrupted by the SLURM controller (via signal handling). Users who prefer to let their code handle the resubmission (for example, when submitit is used) can now pass:
python
from pytorch_lightning.plugins.environments import SLURMEnvironment
trainer = pl.Trainer(plugins=SLURMEnvironment(auto_requeue=False))
Fault-tolerance improvements
The [Fault-tolerance training](https://pytorch-lightning.readthedocs.io/en/stable/advanced/fault_tolerant_training.html) under manual optimization now tracks optimization progress. We also changed the graceful exit signal from `SIGUSR1` to `SIGTERM` for better support inside cloud instances.
An additional feature we're excited to announce is support for consecutive `trainer.fit()` calls.
python
trainer = pl.Trainer(max_epochs=2)
trainer.fit(model)
now, run 2 more epochs
trainer.fit_loop.max_epochs = 4
trainer.fit(model)
Loop customization improvements
The [`Loop`](https://pytorch-lightning.readthedocs.io/en/stable/extensions/loops.html)'s state is now included as part of the checkpoints saved by the library. This enables finer restoration of custom loops.
We've also made it easier to replace Lightning's loops with your own. For example:
python
class MyCustomLoop(pl.loops.TrainingEpochLoop):
...
trainer = pl.Trainer(...)
trainer.fit_loop.replace(epoch_loop=MyCustomLoop)
Trainer runs the fit loop with your new epoch loop!
trainer.fit(model)
Data-Loading improvements
In previous versions, Lightning required that the `DataLoader` instance set its input arguments as instance attributes. This meant that custom `DataLoader`s also had this hidden requirement. In this release, we do this automatically for the user, easing the passing of custom loaders:
diff
class MyDataLoader(torch.utils.data.DataLoader):
def __init__(self, a=123, *args, **kwargs):
- this was required before
- self.a = a
super().__init__(*args, **kwargs)
trainer.fit(model, train_dataloader=MyDataLoader())
As of this release, Lightning no longer pre-fetches 1 extra batch if it doesn't need to. Previously, doing so would conflict with the internal pre-fetching done by optimized data loaders such as [FFCV's](https://ffcv.io/). You can now define your own pre-fetching value like this:
python
class MyCustomLoop(pl.loops.FitLoop):
property
def prefetch_batches(self):
return 7 lucky number 7
trainer = pl.Trainer(...)
trainer.fit_loop = MyCustomLoop(min_epochs=trainer.min_epochs, max_epochs=trainer.max_epochs)
New Hooks
`LightningModule.lr_scheduler_step`
Lightning now allows the use of [custom learning rate schedulers](https://pytorch-lightning.readthedocs.io/en/stable/common/optimization.html#bring-your-own-custom-learning-rate-schedulers) that aren't natively available in [PyTorch](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate). A great example of this is [Timm Schedulers](https://github.com/rwightman/pytorch-image-models/blob/master/timm/scheduler/scheduler.py).
When using custom learning rate schedulers relying on an API other than PyTorch's, you can now define the `LightningModule.lr_scheduler_step` with your desired logic.
python
from timm.scheduler import TanhLRScheduler
class MyLightningModule(pl.LightningModule):
def configure_optimizers(self):
optimizer = ...
scheduler = TanhLRScheduler(optimizer, ...)
return {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "interval": "epoch"}}
def lr_scheduler_step(self, scheduler, optimizer_idx, metric):
scheduler.step(epoch=self.current_epoch) timm's scheduler need the epoch value
A new stateful API
This release introduces new hooks to standardize all stateful components to use `state_dict` and `load_state_dict`, mimicking the PyTorch API. The new hooks receive their own component's state and replace most usages of the previous `on_save_checkpoint` and `on_load_checkpoint` hooks.
diff
def MyCallback(pl.Callback):
- def on_save_checkpoint(self, trainer, pl_module, checkpoint):
- return {'x': self.x}
- def on_load_checkpoint(self, trainer, pl_module, checkpoint):
- self.x = x
+ def state_dict(self):
+ return {'x': self.x}
+ def load_state_dict(self, checkpoint):
+ self.x = x
New properties
`Trainer.estimated_stepping_batches`
You can use built-in `Trainer.estimated_stepping_batches` to compute the total number of stepping batches needed for the complete training.
The property takes gradient accumulation factor and distributed setting into consideration when performing this computation so that you don't have to derive it manually:
python
class MyLightningModule(pl.LightningModule):
def configure_optimizers(self):
optimizer = ...
scheduler = torch.optim.lr_scheduler.OneCycleLR(
optimizer, max_lr=1e-3, total_steps=self.trainer.estimated_stepping_batches
)
return {"optimizer": optimizer, "lr_scheduler": scheduler}
`Trainer.num_devices` and `Trainer.device_ids`
In the past, retrieving the number of devices used, or their IDs, posed a considerable challenge. Additionally, doing so required knowing which property to access based on the current `Trainer` configuration.
To simplify this process, we've deprecated the per-accelerator properties to have accelerator agnostic properties. For example:
diff
- num_devices = max(1, trainer.num_gpus, trainer.num_processes)
- if trainer.tpu_cores:
- num_devices = max(num_devices, trainer.tpu_cores)
+ num_devices = trainer.num_devices
Experimental Features
Manual Fault-tolerance
[Fault Tolerance](https://pytorch-lightning.readthedocs.io/en/latest/advanced/fault_tolerant_training.html#:~:text=Fault%2Dtolerant%20Training%20is%20an,can%20shutdown%20at%20any%20time.) has limitations that require specific information about your data-loading structure.
It is now possible to resolve those limitations by enabling manual fault tolerance where you can write your own logic and specify how exactly to checkpoint your own datasets and samplers. You can do so using this environment flag:
shell
$ PL_FAULT_TOLERANT_TRAINING=MANUAL python script.py
Check out [this video](https://www.youtube.com/watch?v=-HRh_szyuhE) for a dive into the internals of this flag.
Customizing the layer synchronization
We introduced a new plugin class for wrapping layers of a model with synchronization logic for multiprocessing.
python
class MyLayerSync(pl.plugins.LayerSync):
...
layer_sync = MyLayerSync(...)
trainer = Trainer(sync_batchnorm=True, plugins=layer_sync, strategy="ddp")
Registering Custom Accelerators
There has been much progress in the field of ML Accelerators, and the list of accelerators is constantly expanding.
We've made it easier for users to try out new accelerators by enabling support for registering custom `Accelerator` classes in Lightning.
python
from pytorch_lightning.accelerators import Accelerator, AcceleratorRegistry
class SOTAAccelerator(Accelerator):
def __init__(self, x):
...
AcceleratorRegistry.register("sota_accelerator", SOTAAccelerator, x=123)
the following works now:
trainer = Trainer(accelerator="sota_accelerator")
<a name="bc-changes"></a>
Backward Incompatible Changes
Here is a selection of notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the CHANGELOG below.
Drop PyTorch 1.7 support
Following our 4 PyTorch release window, this release supports PyTorch 1.8 to 1.11. Support for PyTorch 1.7 has been removed.
Drop Python 3.6 support
Following [Python's end-of-life](https://endoflife.date/python), support for Python 3.6 has been removed.
`AcceleratorConnector` rewrite
To support new accelerator and stategy features, we completely rewrote our internal `AcceleratorConncetor` class. No backwards compatibility was maintained so it is likely to have broken your code if it was using this class.
Re-define the `current_epoch` boundary
To resolve fault-tolerance issues, we changed where the current epoch value gets increased.
`trainer.current_epoch` is now increased by 1 `on_train_end`. This means that if a model is run for 3 epochs (0, 1, 2), `trainer.current_epoch` will now return 3 instead of 2 after `trainer.fit()`. This can also impact custom callbacks that acess this property inside this hook.
This also impacts checkpoints saved during an epoch (e.g. `on_train_epoch_end`). For example, a `Trainer(max_epochs=1, limit_train_batches=1)` instance that saves a checkpoint will have the `current_epoch=0` value saved instead of `current_epoch=1`.
Re-define the `global_step` boundary
To resolve fault-tolerance issues, we changed where the global step value gets increased.
Access to `trainer.global_step` during an intra-training validation hook will now correctly return the number of optimizer steps taken already. In pseudocode:
diff
training_step()
+ global_step += 1
validation_if_necessary()
- global_step += 1
Saved checkpoints that use the global step value as part of the filename are now increased by 1 for the same reason. A checkpoint saved after 1 step will be now be named `step=1.ckpt` instead of `step=0.ckpt`.
The `trainer.global_step` value will now account for TBPTT or multiple optimizers. Users setting `Trainer({min,max}_steps=...)` under these circumstances will need to adjust their values.
Removed automatic reduction of outputs in `training_step` when using DataParallel
When using `Trainer(strategy="dp")`, *all* the tensors returned by training_step were previously reduced to a scalar (https://github.com/PyTorchLightning/pytorch-lightning/pull/11594). This behavior was especially confusing when outputs needed to be collected into the `training_epoch_end` hook.
From now on, outputs are no longer reduced except for the `loss` tensor, unless you implement `training_step_end`, in which case the loss won't get reduced either.
No longer fallback to CPU with no devices
Previous versions were lenient in that the lack of GPU devices defaulted to running on CPU. This meant that users' code could be running much slower without them ever noticing that it was running on CPU.
We suggest passing `Trainer(accelerator="auto")` when this leniency is desired.
<a name="changelog"></a>
CHANGELOG
<details><summary>Added</summary>
- Allow logging to an existing run ID in MLflow with `MLFlowLogger` ([12290](https://github.com/PyTorchLightning/pytorch-lightning/pull/12290))
- Enable gradient accumulation using Horovod's `backward_passes_per_step` ([11911](https://github.com/PyTorchLightning/pytorch-lightning/pull/11911))
- Add new `DETAIL` log level to provide useful logs for improving monitoring and debugging of batch jobs ([11008](https://github.com/PyTorchLightning/pytorch-lightning/pull/11008))
- Added a flag `SLURMEnvironment(auto_requeue=True|False)` to control whether Lightning handles the requeuing ([10601](https://github.com/PyTorchLightning/pytorch-lightning/pull/10601))
- Fault Tolerant Manual
* Add `_Stateful` protocol to detect if classes are stateful ([10646](https://github.com/PyTorchLightning/pytorch-lightning/pull/10646))
* Add `_FaultTolerantMode` enum used to track different supported fault tolerant modes ([10645](https://github.com/PyTorchLightning/pytorch-lightning/pull/10645))
* Add a `_rotate_worker_indices` utility to reload the state according the latest worker ([10647](https://github.com/PyTorchLightning/pytorch-lightning/pull/10647))
* Add stateful workers ([10674](https://github.com/PyTorchLightning/pytorch-lightning/pull/10674))
* Add an utility to collect the states across processes ([10639](https://github.com/PyTorchLightning/pytorch-lightning/pull/10639))
* Add logic to reload the states across data loading components ([10699](https://github.com/PyTorchLightning/pytorch-lightning/pull/10699))
* Cleanup some fault tolerant utilities ([10703](https://github.com/PyTorchLightning/pytorch-lightning/pull/10703))
* Enable Fault Tolerant Manual Training ([10707](https://github.com/PyTorchLightning/pytorch-lightning/pull/10707))
* Broadcast the `_terminate_gracefully` to all processes and add support for DDP ([10638](https://github.com/PyTorchLightning/pytorch-lightning/pull/10638))
- Added support for re-instantiation of custom (subclasses of) `DataLoaders` returned in the `*_dataloader()` methods, i.e., automatic replacement of samplers now works with custom types of `DataLoader` ([10680](https://github.com/PyTorchLightning/pytorch-lightning/pull/10680))
- Added a function to validate if fault tolerant training is supported. ([10465](https://github.com/PyTorchLightning/pytorch-lightning/pull/10465))
- Added a private callback to manage the creation and deletion of fault-tolerance checkpoints ([11862](https://github.com/PyTorchLightning/pytorch-lightning/pull/11862))
- Show a better error message when a custom `DataLoader` implementation is not well implemented and we need to reconstruct it ([10719](https://github.com/PyTorchLightning/pytorch-lightning/pull/10719))
- Show a better error message when frozen dataclass is used as a batch ([10927](https://github.com/PyTorchLightning/pytorch-lightning/pull/10927))
- Save the `Loop`'s state by default in the checkpoint ([10784](https://github.com/PyTorchLightning/pytorch-lightning/pull/10784))
- Added `Loop.replace` to easily switch one loop for another ([10324](https://github.com/PyTorchLightning/pytorch-lightning/pull/10324))
- Added support for `--lr_scheduler=ReduceLROnPlateau` to the `LightningCLI` ([10860](https://github.com/PyTorchLightning/pytorch-lightning/pull/10860))
- Added `LightningCLI.configure_optimizers` to override the `configure_optimizers` return value ([10860](https://github.com/PyTorchLightning/pytorch-lightning/pull/10860))
- Added `LightningCLI(auto_registry)` flag to register all subclasses of the registerable components automatically ([12108](https://github.com/PyTorchLightning/pytorch-lightning/pull/12108))
- Added a warning that shows when `max_epochs` in the `Trainer` is not set ([10700](https://github.com/PyTorchLightning/pytorch-lightning/pull/10700))
- Added support for returning a single Callback from `LightningModule.configure_callbacks` without wrapping it into a list ([11060](https://github.com/PyTorchLightning/pytorch-lightning/pull/11060))
- Added `console_kwargs` for `RichProgressBar` to initialize inner Console ([10875](https://github.com/PyTorchLightning/pytorch-lightning/pull/10875))
- Added support for shorthand notation to instantiate loggers with the `LightningCLI` ([11533](https://github.com/PyTorchLightning/pytorch-lightning/pull/11533))
- Added a `LOGGER_REGISTRY` instance to register custom loggers to the `LightningCLI` ([11533](https://github.com/PyTorchLightning/pytorch-lightning/pull/11533))
- Added info message when the `Trainer` arguments `limit_*_batches`, `overfit_batches`, or `val_check_interval` are set to `1` or `1.0` ([11950](https://github.com/PyTorchLightning/pytorch-lightning/pull/11950))
- Added a `PrecisionPlugin.teardown` method ([10990](https://github.com/PyTorchLightning/pytorch-lightning/pull/10990))
- Added `LightningModule.lr_scheduler_step` ([10249](https://github.com/PyTorchLightning/pytorch-lightning/pull/10249))
- Added support for no pre-fetching to `DataFetcher` ([11606](https://github.com/PyTorchLightning/pytorch-lightning/pull/11606))
- Added support for optimizer step progress tracking with manual optimization ([11848](https://github.com/PyTorchLightning/pytorch-lightning/pull/11848))
- Return the output of the `optimizer.step`. This can be useful for `LightningLite` users, manual optimization users, or users overriding `LightningModule.optimizer_step` ([11711](https://github.com/PyTorchLightning/pytorch-lightning/pull/11711))
- Teardown the active loop and strategy on exception ([11620](https://github.com/PyTorchLightning/pytorch-lightning/pull/11620))
- Added a `MisconfigurationException` if user provided `opt_idx` in scheduler config doesn't match with actual optimizer index of its respective optimizer ([11247](https://github.com/PyTorchLightning/pytorch-lightning/pull/11247))
- Added a `loggers` property to `Trainer` which returns a list of loggers provided by the user ([11683](https://github.com/PyTorchLightning/pytorch-lightning/pull/11683))
- Added a `loggers` property to `LightningModule` which retrieves the `loggers` property from `Trainer` ([11683](https://github.com/PyTorchLightning/pytorch-lightning/pull/11683))
- Added support for DDP when using a `CombinedLoader` for the training data ([11648](https://github.com/PyTorchLightning/pytorch-lightning/pull/11648))
- Added a warning when using `DistributedSampler` during validation/testing ([11479](https://github.com/PyTorchLightning/pytorch-lightning/pull/11479))
- Added support for `Bagua` training strategy ([11146](https://github.com/PyTorchLightning/pytorch-lightning/pull/11146))
- Added support for manually returning a `poptorch.DataLoader` in a `*_dataloader` hook ([12116](https://github.com/PyTorchLightning/pytorch-lightning/pull/12116))
- Added `rank_zero` module to centralize utilities ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Added a `_Stateful` support for `LightningDataModule` ([11637](https://github.com/PyTorchLightning/pytorch-lightning/pull/11637))
- Added `_Stateful` support for `PrecisionPlugin` ([11638](https://github.com/PyTorchLightning/pytorch-lightning/pull/11638))
- Added `Accelerator.is_available` to check device availability ([11797](https://github.com/PyTorchLightning/pytorch-lightning/pull/11797))
- Enabled static type-checking on the signature of `Trainer` ([11888](https://github.com/PyTorchLightning/pytorch-lightning/pull/11888))
- Added utility functions for moving optimizers to devices ([11758](https://github.com/PyTorchLightning/pytorch-lightning/pull/11758))
- Added a warning when saving an instance of `nn.Module` with `save_hyperparameters()` ([12068](https://github.com/PyTorchLightning/pytorch-lightning/pull/12068))
- Added `estimated_stepping_batches` property to `Trainer` ([11599](https://github.com/PyTorchLightning/pytorch-lightning/pull/11599))
- Added support for pluggable Accelerators ([12030](https://github.com/PyTorchLightning/pytorch-lightning/pull/12030))
- Added profiling for `on_load_checkpoint`/`on_save_checkpoint` callback and LightningModule hooks ([12149](https://github.com/PyTorchLightning/pytorch-lightning/pull/12149))
- Added `LayerSync` and `NativeSyncBatchNorm` plugins ([11754](https://github.com/PyTorchLightning/pytorch-lightning/pull/11754))
- Added optional `storage_options` argument to `Trainer.save_checkpoint()` to pass to custom `CheckpointIO` implementations ([11891](https://github.com/PyTorchLightning/pytorch-lightning/pull/11891))
- Added support to explicitly specify the process group backend for parallel strategies ([11745](https://github.com/PyTorchLightning/pytorch-lightning/pull/11745))
- Added `device_ids` and `num_devices` property to `Trainer` ([12151](https://github.com/PyTorchLightning/pytorch-lightning/pull/12151))
- Added `Callback.state_dict()` and `Callback.load_state_dict()` methods ([12232](https://github.com/PyTorchLightning/pytorch-lightning/pull/12232))
- Added `AcceleratorRegistry` ([12180](https://github.com/PyTorchLightning/pytorch-lightning/pull/12180))
- Added support for Habana Accelerator (HPU) ([11808](https://github.com/PyTorchLightning/pytorch-lightning/pull/11808))
- Added support for dataclasses in `apply_to_collections` ([11889](https://github.com/PyTorchLightning/pytorch-lightning/pull/11889))
</details>
<details><summary>Changed</summary>
- Drop PyTorch 1.7 support ([12191](https://github.com/PyTorchLightning/pytorch-lightning/pull/12191)), ([#12432](https://github.com/PyTorchLightning/pytorch-lightning/pull/12432))
- Make `benchmark` flag optional and set its value based on the deterministic flag ([11944](https://github.com/PyTorchLightning/pytorch-lightning/pull/11944))
- Implemented a new native and rich format in `_print_results` method of the `EvaluationLoop` ([11332](https://github.com/PyTorchLightning/pytorch-lightning/pull/11332))
- Do not print an empty table at the end of the `EvaluationLoop` ([12427](https://github.com/PyTorchLightning/pytorch-lightning/pull/12427))
- Set the `prog_bar` flag to False in `LightningModule.log_grad_norm` ([11472](https://github.com/PyTorchLightning/pytorch-lightning/pull/11472))
- Raised exception in `init_dist_connection()` when torch distributed is not available ([10418](https://github.com/PyTorchLightning/pytorch-lightning/pull/10418))
- The `monitor` argument in the `EarlyStopping` callback is no longer optional ([10328](https://github.com/PyTorchLightning/pytorch-lightning/pull/10328))
- Do not fail if batch size could not be inferred for logging when using DeepSpeed ([10438](https://github.com/PyTorchLightning/pytorch-lightning/pull/10438))
- Raised `MisconfigurationException` when `enable_progress_bar=False` and a progress bar instance has been passed in the callback list ([10520](https://github.com/PyTorchLightning/pytorch-lightning/pull/10520))
- Moved `trainer.connectors.env_vars_connector._defaults_from_env_vars` to `utilities.argsparse._defaults_from_env_vars` ([10501](https://github.com/PyTorchLightning/pytorch-lightning/pull/10501))
- Changes in `LightningCLI` required for the new major release of jsonargparse v4.0.0 ([10426](https://github.com/PyTorchLightning/pytorch-lightning/pull/10426))
- Renamed `refresh_rate_per_second` parameter to `refresh_rate` for `RichProgressBar` signature ([10497](https://github.com/PyTorchLightning/pytorch-lightning/pull/10497))
- Moved ownership of the `PrecisionPlugin` into `TrainingTypePlugin` and updated all references ([10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))
- Fault Tolerant relies on `signal.SIGTERM` to gracefully exit instead of `signal.SIGUSR1` ([10605](https://github.com/PyTorchLightning/pytorch-lightning/pull/10605))
- `Loop.restarting=...` now sets the value recursively for all subloops ([11442](https://github.com/PyTorchLightning/pytorch-lightning/pull/11442))
- Raised an error if the `batch_size` cannot be inferred from the current batch if it contained a string or was a custom batch object ([10541](https://github.com/PyTorchLightning/pytorch-lightning/pull/10541))
- The validation loop is now disabled when `overfit_batches > 0` is set in the Trainer ([9709](https://github.com/PyTorchLightning/pytorch-lightning/pull/9709))
- Moved optimizer related logics from `Accelerator` to `TrainingTypePlugin` ([10596](https://github.com/PyTorchLightning/pytorch-lightning/pull/10596))
- Moved ownership of the lightning optimizers from the `Trainer` to the `Strategy` ([11444](https://github.com/PyTorchLightning/pytorch-lightning/pull/11444))
- Moved ownership of the data fetchers from the DataConnector to the Loops ([11621](https://github.com/PyTorchLightning/pytorch-lightning/pull/11621))
- Moved `batch_to_device` method from `Accelerator` to `TrainingTypePlugin` ([10649](https://github.com/PyTorchLightning/pytorch-lightning/pull/10649))
- The `DDPSpawnPlugin` no longer overrides the `post_dispatch` plugin hook ([10034](https://github.com/PyTorchLightning/pytorch-lightning/pull/10034))
- Integrate the progress bar implementation with progress tracking ([11213](https://github.com/PyTorchLightning/pytorch-lightning/pull/11213))
- The `LightningModule.{add_to_queue,get_from_queue}` hooks no longer get a `torch.multiprocessing.SimpleQueue` and instead receive a list based queue ([10034](https://github.com/PyTorchLightning/pytorch-lightning/pull/10034))
- Changed `training_step`, `validation_step`, `test_step` and `predict_step` method signatures in `Accelerator` and updated input from caller side ([10908](https://github.com/PyTorchLightning/pytorch-lightning/pull/10908))
- Changed the name of the temporary checkpoint that the `DDPSpawnPlugin` and related plugins save ([10934](https://github.com/PyTorchLightning/pytorch-lightning/pull/10934))
- `LoggerCollection` returns only unique logger names and versions ([10976](https://github.com/PyTorchLightning/pytorch-lightning/pull/10976))
- Redesigned process creation for spawn-based plugins (`DDPSpawnPlugin`, `TPUSpawnPlugin`, etc.) ([10896](https://github.com/PyTorchLightning/pytorch-lightning/pull/10896))
* All spawn-based plugins now spawn processes immediately upon calling `Trainer.{fit,validate,test,predict}`
* The hooks/callbacks `prepare_data`, `setup`, `configure_sharded_model` and `teardown` now run under initialized process group for spawn-based plugins just like their non-spawn counterparts
* Some configuration errors that were previously raised as `MisconfigurationException`s will now be raised as `ProcessRaisedException` (torch>=1.8) or as `Exception` (torch<1.8)
* Removed the `TrainingTypePlugin.pre_dispatch()` method and merged it with `TrainingTypePlugin.setup()` ([11137](https://github.com/PyTorchLightning/pytorch-lightning/pull/11137))
- Changed profiler to index and display the names of the hooks with a new pattern [<base class>]<class>.<hook name> ([11026](https://github.com/PyTorchLightning/pytorch-lightning/pull/11026))
- Changed `batch_to_device` entry in profiling from stage-specific to generic, to match profiling of other hooks ([11031](https://github.com/PyTorchLightning/pytorch-lightning/pull/11031))
- Changed the info message for finalizing ddp-spawn worker processes to a debug-level message ([10864](https://github.com/PyTorchLightning/pytorch-lightning/pull/10864))
- Removed duplicated file extension when uploading model checkpoints with `NeptuneLogger` ([11015](https://github.com/PyTorchLightning/pytorch-lightning/pull/11015))
- Removed `__getstate__` and `__setstate__` of `RichProgressBar` ([11100](https://github.com/PyTorchLightning/pytorch-lightning/pull/11100))
- The `DDPPlugin` and `DDPSpawnPlugin` and their subclasses now remove the `SyncBatchNorm` wrappers in `teardown()` to enable proper support at inference after fitting ([11078](https://github.com/PyTorchLightning/pytorch-lightning/pull/11078))
- Moved ownership of the `Accelerator` instance to the `TrainingTypePlugin`; all training-type plugins now take an optional parameter `accelerator` ([11022](https://github.com/PyTorchLightning/pytorch-lightning/pull/11022))
- Renamed the `TrainingTypePlugin` to `Strategy` ([11120](https://github.com/PyTorchLightning/pytorch-lightning/pull/11120))
* Renamed the `ParallelPlugin` to `ParallelStrategy` ([11123](https://github.com/PyTorchLightning/pytorch-lightning/pull/11123))
* Renamed the `DataParallelPlugin` to `DataParallelStrategy` ([11183](https://github.com/PyTorchLightning/pytorch-lightning/pull/11183))
* Renamed the `DDPPlugin` to `DDPStrategy` ([11142](https://github.com/PyTorchLightning/pytorch-lightning/pull/11142))
* Renamed the `DDP2Plugin` to `DDP2Strategy` ([11185](https://github.com/PyTorchLightning/pytorch-lightning/pull/11185))
* Renamed the `DDPShardedPlugin` to `DDPShardedStrategy` ([11186](https://github.com/PyTorchLightning/pytorch-lightning/pull/11186))
* Renamed the `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` ([11143](https://github.com/PyTorchLightning/pytorch-lightning/pull/11143))
* Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` ([11145](https://github.com/PyTorchLightning/pytorch-lightning/pull/11145))
* Renamed the `DDPSpawnShardedPlugin` to `DDPSpawnShardedStrategy` ([11210](https://github.com/PyTorchLightning/pytorch-lightning/pull/11210))
* Renamed the `DeepSpeedPlugin` to `DeepSpeedStrategy` ([11194](https://github.com/PyTorchLightning/pytorch-lightning/pull/11194))
* Renamed the `HorovodPlugin` to `HorovodStrategy` ([11195](https://github.com/PyTorchLightning/pytorch-lightning/pull/11195))
* Renamed the `TPUSpawnPlugin` to `TPUSpawnStrategy` ([11190](https://github.com/PyTorchLightning/pytorch-lightning/pull/11190))
* Renamed the `IPUPlugin` to `IPUStrategy` ([11193](https://github.com/PyTorchLightning/pytorch-lightning/pull/11193))
* Renamed the `SingleDevicePlugin` to `SingleDeviceStrategy` ([11182](https://github.com/PyTorchLightning/pytorch-lightning/pull/11182))
* Renamed the `SingleTPUPlugin` to `SingleTPUStrategy` ([11182](https://github.com/PyTorchLightning/pytorch-lightning/pull/11182))
* Renamed the `TrainingTypePluginsRegistry` to `StrategyRegistry` ([11233](https://github.com/PyTorchLightning/pytorch-lightning/pull/11233))
- Marked the `ResultCollection`, `ResultMetric`, and `ResultMetricCollection` classes as protected ([11130](https://github.com/PyTorchLightning/pytorch-lightning/pull/11130))
- Marked `trainer.checkpoint_connector` as protected ([11550](https://github.com/PyTorchLightning/pytorch-lightning/pull/11550))
- The epoch start/end hooks are now called by the `FitLoop` instead of the `TrainingEpochLoop` ([11201](https://github.com/PyTorchLightning/pytorch-lightning/pull/11201))
- DeepSpeed does not require lightning module zero 3 partitioning ([10655](https://github.com/PyTorchLightning/pytorch-lightning/pull/10655))
- Moved `Strategy` classes to the `strategies` directory ([11226](https://github.com/PyTorchLightning/pytorch-lightning/pull/11226))
- Renamed `training_type_plugin` file to `strategy` ([11239](https://github.com/PyTorchLightning/pytorch-lightning/pull/11239))
- Changed `DeviceStatsMonitor` to group metrics based on the logger's `group_separator` ([11254](https://github.com/PyTorchLightning/pytorch-lightning/pull/11254))
- Raised `UserWarning` if evaluation is triggered with `best` ckpt and trainer is configured with multiple checkpoint callbacks ([11274](https://github.com/PyTorchLightning/pytorch-lightning/pull/11274))
- `Trainer.logged_metrics` now always contains scalar tensors, even when a Python scalar was logged ([11270](https://github.com/PyTorchLightning/pytorch-lightning/pull/11270))
- The tuner now uses the checkpoint connector to copy and restore its state ([11518](https://github.com/PyTorchLightning/pytorch-lightning/pull/11518))
- Changed `MisconfigurationException` to `ModuleNotFoundError` when `rich` isn't available ([11360](https://github.com/PyTorchLightning/pytorch-lightning/pull/11360))
- The `trainer.current_epoch` value is now increased by 1 during and after `on_train_end` ([8578](https://github.com/PyTorchLightning/pytorch-lightning/pull/8578))
- The `trainer.global_step` value now accounts for multiple optimizers and TBPTT splits ([11805](https://github.com/PyTorchLightning/pytorch-lightning/pull/11805))
- The `trainer.global_step` value is now increased right after the `optimizer.step()` call which will impact users who access it during an intra-training validation hook ([11805](https://github.com/PyTorchLightning/pytorch-lightning/pull/11805))
- The filename of checkpoints created with `ModelCheckpoint(filename='{step}')` is different compared to previous versions. A checkpoint saved after 1 step will be named `step=1.ckpt` instead of `step=0.ckpt` ([11805](https://github.com/PyTorchLightning/pytorch-lightning/pull/11805))
- Inherit from `ABC` for `Accelerator`: Users need to implement `auto_device_count` ([11521](https://github.com/PyTorchLightning/pytorch-lightning/pull/11521))
- Changed `parallel_devices` property in `ParallelStrategy` to be lazy initialized ([11572](https://github.com/PyTorchLightning/pytorch-lightning/pull/11572))
- Updated `TQDMProgressBar` to run a separate progress bar for each eval dataloader ([11657](https://github.com/PyTorchLightning/pytorch-lightning/pull/11657))
- Sorted `SimpleProfiler(extended=False)` summary based on mean duration for each hook ([11671](https://github.com/PyTorchLightning/pytorch-lightning/pull/11671))
- Avoid enforcing `shuffle=False` for eval dataloaders ([11575](https://github.com/PyTorchLightning/pytorch-lightning/pull/11575))
- When using DP (data-parallel), Lightning will no longer automatically reduce all tensors returned in training_step; it will only reduce the loss unless `training_step_end` is overridden ([11594](https://github.com/PyTorchLightning/pytorch-lightning/pull/11594))
- When using DP (data-parallel), the `training_epoch_end` hook will no longer receive reduced outputs from `training_step` and instead get the full tensor of results from all GPUs ([11594](https://github.com/PyTorchLightning/pytorch-lightning/pull/11594))
- Changed default logger name to `lightning_logs` for consistency ([11762](https://github.com/PyTorchLightning/pytorch-lightning/pull/11762))
- Rewrote `accelerator_connector` ([11448](https://github.com/PyTorchLightning/pytorch-lightning/pull/11448))
- When manual optimization is used with DDP, we no longer force `find_unused_parameters=True` ([12425](https://github.com/PyTorchLightning/pytorch-lightning/pull/12425))
- Disable loading dataloades if corresponding `limit_batches=0` ([11576](https://github.com/PyTorchLightning/pytorch-lightning/pull/11576))
- Removed `is_global_zero` check in `training_epoch_loop` before `logger.save`. If you have a custom logger that implements `save` the Trainer will now call `save` on all ranks by default. To change this behavior add `rank_zero_only` to your `save` implementation ([12134](https://github.com/PyTorchLightning/pytorch-lightning/pull/12134))
- Disabled tuner with distributed strategies ([12179](https://github.com/PyTorchLightning/pytorch-lightning/pull/12179))
- Marked `trainer.logger_connector` as protected ([12195](https://github.com/PyTorchLightning/pytorch-lightning/pull/12195))
- Move `Strategy.process_dataloader` function call from `fit/evaluation/predict_loop.py` to `data_connector.py` ([12251](https://github.com/PyTorchLightning/pytorch-lightning/pull/12251))
- `ModelCheckpoint(save_last=True, every_n_epochs=N)` now saves a "last" checkpoint every epoch (disregarding `every_n_epochs`) instead of only once at the end of training ([12418](https://github.com/PyTorchLightning/pytorch-lightning/pull/12418))
- The strategies that support `sync_batchnorm` now only apply it when fitting ([11919](https://github.com/PyTorchLightning/pytorch-lightning/pull/11919))
- Avoided fallback on CPU if no devices are provided for other accelerators ([12410](https://github.com/PyTorchLightning/pytorch-lightning/pull/12410))
- Modified `supporters.py` so that in the accumulator element (for loss) is created directly on the device ([12430](https://github.com/PyTorchLightning/pytorch-lightning/pull/12430))
- Removed `EarlyStopping.on_save_checkpoint` and `EarlyStopping.on_load_checkpoint` in favor of `EarlyStopping.state_dict` and `EarlyStopping.load_state_dict` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Removed `BaseFinetuning.on_save_checkpoint` and `BaseFinetuning.on_load_checkpoint` in favor of `BaseFinetuning.state_dict` and `BaseFinetuning.load_state_dict` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Removed `BackboneFinetuning.on_save_checkpoint` and `BackboneFinetuning.on_load_checkpoint` in favor of `BackboneFinetuning.state_dict` and `BackboneFinetuning.load_state_dict` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Removed `ModelCheckpoint.on_save_checkpoint` and `ModelCheckpoint.on_load_checkpoint` in favor of `ModelCheckpoint.state_dict` and `ModelCheckpoint.load_state_dict` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Removed `Timer.on_save_checkpoint` and `Timer.on_load_checkpoint` in favor of `Timer.state_dict` and `Timer.load_state_dict` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Replaced PostLocalSGDOptimizer with a dedicated model averaging component ([12378](https://github.com/PyTorchLightning/pytorch-lightning/pull/12378))
</details>
<details><summary>Deprecated</summary>
- Deprecated `training_type_plugin` property in favor of `strategy` in `Trainer` and updated the references ([11141](https://github.com/PyTorchLightning/pytorch-lightning/pull/11141))
- Deprecated `Trainer.{validated,tested,predicted}_ckpt_path` and replaced with read-only property `Trainer.ckpt_path` set when checkpoints loaded via `Trainer.{fit,validate,test,predict}` ([11696](https://github.com/PyTorchLightning/pytorch-lightning/pull/11696))
- Deprecated `ClusterEnvironment.master_{address,port}` in favor of `ClusterEnvironment.main_{address,port}` ([10103](https://github.com/PyTorchLightning/pytorch-lightning/pull/10103))
- Deprecated `DistributedType` in favor of `_StrategyType` ([10505](https://github.com/PyTorchLightning/pytorch-lightning/pull/10505))
- Deprecated the `precision_plugin` constructor argument from `Accelerator` ([10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))
- Deprecated `DeviceType` in favor of `_AcceleratorType` ([10503](https://github.com/PyTorchLightning/pytorch-lightning/pull/10503))
- Deprecated the property `Trainer.slurm_job_id` in favor of the new `SLURMEnvironment.job_id()` method ([10622](https://github.com/PyTorchLightning/pytorch-lightning/pull/10622))
- Deprecated the access to the attribute `IndexBatchSamplerWrapper.batch_indices` in favor of `IndexBatchSamplerWrapper.seen_batch_indices` ([10870](https://github.com/PyTorchLightning/pytorch-lightning/pull/10870))
- Deprecated `on_init_start` and `on_init_end` callback hooks ([10940](https://github.com/PyTorchLightning/pytorch-lightning/pull/10940))
- Deprecated `Trainer.call_hook` in favor of `Trainer._call_callback_hooks`, `Trainer._call_lightning_module_hook`, `Trainer._call_ttp_hook`, and `Trainer._call_accelerator_hook` ([10979](https://github.com/PyTorchLightning/pytorch-lightning/pull/10979))
- Deprecated `TrainingTypePlugin.post_dispatch` in favor of `TrainingTypePlugin.teardown` ([10939](https://github.com/PyTorchLightning/pytorch-lightning/pull/10939))
- Deprecated `ModelIO.on_hpc_{save/load}` in favor of `CheckpointHooks.on_{save/load}_checkpoint` ([10911](https://github.com/PyTorchLightning/pytorch-lightning/pull/10911))
- Deprecated `Trainer.run_stage` in favor of `Trainer.{fit,validate,test,predict}` ([11000](https://github.com/PyTorchLightning/pytorch-lightning/pull/11000))
- Deprecated `Trainer.lr_schedulers` in favor of `Trainer.lr_scheduler_configs` which returns a list of dataclasses instead of dictionaries ([11443](https://github.com/PyTorchLightning/pytorch-lightning/pull/11443))
- Deprecated `Trainer.verbose_evaluate` in favor of `EvaluationLoop(verbose=...)` ([10931](https://github.com/PyTorchLightning/pytorch-lightning/pull/10931))
- Deprecated `Trainer.should_rank_save_checkpoint` Trainer property ([11068](https://github.com/PyTorchLightning/pytorch-lightning/pull/11068))
- Deprecated `Trainer.lightning_optimizers` ([11444](https://github.com/PyTorchLightning/pytorch-lightning/pull/11444))
- Deprecated `TrainerOptimizersMixin` and moved functionality to `core/optimizer.py`([11155](https://github.com/PyTorchLightning/pytorch-lightning/pull/11155))
- Deprecated the `on_train_batch_end(outputs)` format when multiple optimizers are used and TBPTT is enabled ([12182](https://github.com/PyTorchLightning/pytorch-lightning/pull/12182))
- Deprecated the `training_epoch_end(outputs)` format when multiple optimizers are used and TBPTT is enabled ([12182](https://github.com/PyTorchLightning/pytorch-lightning/pull/12182))
- Deprecated `TrainerCallbackHookMixin` ([11148](https://github.com/PyTorchLightning/pytorch-lightning/pull/11148))
- Deprecated `TrainerDataLoadingMixin` and moved functionality to `Trainer` and `DataConnector` ([11282](https://github.com/PyTorchLightning/pytorch-lightning/pull/11282))
- Deprecated function `pytorch_lightning.callbacks.device_stats_monitor.prefix_metric_keys` ([11254](https://github.com/PyTorchLightning/pytorch-lightning/pull/11254))
- Deprecated `Callback.on_epoch_start` hook in favour of `Callback.on_{train/val/test}_epoch_start` ([11578](https://github.com/PyTorchLightning/pytorch-lightning/pull/11578))
- Deprecated `Callback.on_epoch_end` hook in favour of `Callback.on_{train/val/test}_epoch_end` ([11578](https://github.com/PyTorchLightning/pytorch-lightning/pull/11578))
- Deprecated `LightningModule.on_epoch_start` hook in favor of `LightningModule.on_{train/val/test}_epoch_start` ([11578](https://github.com/PyTorchLightning/pytorch-lightning/pull/11578))
- Deprecated `LightningModule.on_epoch_end` hook in favor of `LightningModule.on_{train/val/test}_epoch_end` ([11578](https://github.com/PyTorchLightning/pytorch-lightning/pull/11578))
- Deprecated `on_before_accelerator_backend_setup` callback hook in favour of `setup` ([11568](https://github.com/PyTorchLightning/pytorch-lightning/pull/11568))
- Deprecated `on_batch_start` and `on_batch_end` callback hooks in favor of `on_train_batch_start` and `on_train_batch_end` ([11577](https://github.com/PyTorchLightning/pytorch-lightning/pull/11577))
- Deprecated `on_configure_sharded_model` callback hook in favor of `setup` ([11627](https://github.com/PyTorchLightning/pytorch-lightning/pull/11627))
- Deprecated `pytorch_lightning.utilities.distributed.rank_zero_only` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_only` ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Deprecated `pytorch_lightning.utilities.distributed.rank_zero_debug` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_debug` ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Deprecated `pytorch_lightning.utilities.distributed.rank_zero_info` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_info` ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Deprecated `pytorch_lightning.utilities.warnings.rank_zero_warn` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_warn` ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Deprecated `pytorch_lightning.utilities.warnings.rank_zero_deprecation` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_deprecation` ([11747](https://github.com/PyTorchLightning/pytorch-lightning/pull/11747))
- Deprecated `pytorch_lightning.utilities.warnings.LightningDeprecationWarning` in favor of `pytorch_lightning.utilities.rank_zero.LightningDeprecationWarning`
- Deprecated `on_pretrain_routine_start` and `on_pretrain_routine_end` callback hooks in favor of `on_fit_start` ([11794](https://github.com/PyTorchLightning/pytorch-lightning/pull/11794))
- Deprecated `LightningModule.on_pretrain_routine_start` and `LightningModule.on_pretrain_routine_end` hooks in favor of `on_fit_start` ([12122](https://github.com/PyTorchLightning/pytorch-lightning/pull/12122))
- Deprecated `agg_key_funcs` and `agg_default_func` parameters from `LightningLoggerBase` ([11871](https://github.com/PyTorchLightning/pytorch-lightning/pull/11871))
- Deprecated `LightningLoggerBase.update_agg_funcs` ([11871](https://github.com/PyTorchLightning/pytorch-lightning/pull/11871))
- Deprecated `LightningLoggerBase.agg_and_log_metrics` in favor of `LightningLoggerBase.log_metrics` ([11832](https://github.com/PyTorchLightning/pytorch-lightning/pull/11832))
- Deprecated passing `weights_save_path` to the `Trainer` constructor in favor of adding the `ModelCheckpoint` callback with `dirpath` directly to the list of callbacks ([12084](https://github.com/PyTorchLightning/pytorch-lightning/pull/12084))
- Deprecated `pytorch_lightning.profiler.AbstractProfiler` in favor of `pytorch_lightning.profiler.Profiler` ([12106](https://github.com/PyTorchLightning/pytorch-lightning/pull/12106))
- Deprecated `pytorch_lightning.profiler.BaseProfiler` in favor of `pytorch_lightning.profiler.Profiler` ([12150](https://github.com/PyTorchLightning/pytorch-lightning/pull/12150))
- Deprecated `BaseProfiler.profile_iterable` ([12102](https://github.com/PyTorchLightning/pytorch-lightning/pull/12102))
- Deprecated `LoggerCollection` in favor of `trainer.loggers` ([12147](https://github.com/PyTorchLightning/pytorch-lightning/pull/12147))
- Deprecated `PrecisionPlugin.on_{save,load}_checkpoint` in favor of `PrecisionPlugin.{state_dict,load_state_dict}` ([11978](https://github.com/PyTorchLightning/pytorch-lightning/pull/11978))
- Deprecated `LightningDataModule.on_save/load_checkpoint` in favor of `state_dict/load_state_dict` ([11893](https://github.com/PyTorchLightning/pytorch-lightning/pull/11893))
- Deprecated `Trainer.use_amp` in favor of `Trainer.amp_backend` ([12312](https://github.com/PyTorchLightning/pytorch-lightning/pull/12312))
- Deprecated `LightingModule.use_amp` in favor of `Trainer.amp_backend` ([12315](https://github.com/PyTorchLightning/pytorch-lightning/pull/12315))
- Deprecated specifying the process group backend through the environment variable `PL_TORCH_DISTRIBUTED_BACKEND` ([11745](https://github.com/PyTorchLightning/pytorch-lightning/pull/11745))
- Deprecated `ParallelPlugin.torch_distributed_backend` in favor of `DDPStrategy.process_group_backend` property ([11745](https://github.com/PyTorchLightning/pytorch-lightning/pull/11745))
- Deprecated `ModelCheckpoint.save_checkpoint` in favor of `Trainer.save_checkpoint` ([12456](https://github.com/PyTorchLightning/pytorch-lightning/pull/12456))
- Deprecated `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` ([12151](https://github.com/PyTorchLightning/pytorch-lightning/pull/12151))
- Deprecated `Trainer.root_gpu` in favor of `Trainer.strategy.root_device.index` when GPU is used ([12262](https://github.com/PyTorchLightning/pytorch-lightning/pull/12262))
- Deprecated `Trainer.num_gpus` in favor of `Trainer.num_devices` when GPU is used ([12384](https://github.com/PyTorchLightning/pytorch-lightning/pull/12384))
- Deprecated `Trainer.ipus` in favor of `Trainer.num_devices` when IPU is used ([12386](https://github.com/PyTorchLightning/pytorch-lightning/pull/12386))
- Deprecated `Trainer.num_processes` in favor of `Trainer.num_devices` ([12388](https://github.com/PyTorchLightning/pytorch-lightning/pull/12388))
- Deprecated `Trainer.data_parallel_device_ids` in favor of `Trainer.device_ids` ([12072](https://github.com/PyTorchLightning/pytorch-lightning/pull/12072))
- Deprecated returning state from `Callback.on_save_checkpoint` in favor of returning state in `Callback.state_dict` for checkpointing ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Deprecated passing only the callback state to `Callback.on_load_checkpoint(callback_state)` in favor of passing the callback state to `Callback.load_state_dict` and in 1.8, passing the entire checkpoint dictionary to `Callback.on_load_checkpoint(checkpoint)` ([11887](https://github.com/PyTorchLightning/pytorch-lightning/pull/11887))
- Deprecated `Trainer.gpus` in favor of `Trainer.device_ids` or `Trainer.num_devices` ([12436](https://github.com/PyTorchLightning/pytorch-lightning/pull/12436))
- Deprecated `Trainer.tpu_cores` in favor of `Trainer.num_devices` ([12437](https://github.com/PyTorchLightning/pytorch-lightning/pull/12437))
</details>
<details><summary>Removed</summary>
- Removed deprecated parameter `method` in `pytorch_lightning.utilities.model_helpers.is_overridden` ([10507](https://github.com/PyTorchLightning/pytorch-lightning/pull/10507))
- Remove deprecated method `ClusterEnvironment.creates_children` ([10339](https://github.com/PyTorchLightning/pytorch-lightning/pull/10339))
- Removed deprecated `TrainerModelHooksMixin.is_function_implemented` and `TrainerModelHooksMixin.has_arg` ([10322](https://github.com/PyTorchLightning/pytorch-lightning/pull/10322))
- Removed deprecated `pytorch_lightning.utilities.device_dtype_mixin.DeviceDtypeModuleMixin` in favor of `pytorch_lightning.core.mixins.device_dtype_mixin.DeviceDtypeModuleMixin` ([10442](https://github.com/PyTorchLightning/pytorch-lightning/pull/10442))
- Removed deprecated `LightningModule.loaded_optimizer_states_dict` property ([10346](https://github.com/PyTorchLightning/pytorch-lightning/pull/10346))
- Removed deprecated `Trainer.fit(train_dataloader=)`, `Trainer.validate(val_dataloaders=)`, and `Trainer.test(test_dataloader=)` ([10325](https://github.com/PyTorchLightning/pytorch-lightning/pull/10325))
- Removed deprecated `has_prepared_data`, `has_setup_fit`, `has_setup_validate`, `has_setup_test`, `has_setup_predict`, `has_teardown_fit`, `has_teardown_validate`, `has_teardown_test` and `has_teardown_predict` datamodule lifecycle properties ([10350](https://github.com/PyTorchLightning/pytorch-lightning/pull/10350))
- Removed deprecated `every_n_val_epochs` parameter of ModelCheckpoint ([10366](https://github.com/PyTorchLightning/pytorch-lightning/pull/10366))
- Removed deprecated `import pytorch_lightning.profiler.profilers` in favor of `import pytorch_lightning.profiler` ([10443](https://github.com/PyTorchLightning/pytorch-lightning/pull/10443))
- Removed deprecated property `configure_slurm_dpp` from accelerator connector ([10370](https://github.com/PyTorchLightning/pytorch-lightning/pull/10370))
- Removed deprecated arguments `num_nodes` and `sync_batchnorm` from `DDPPlugin`, `DDPSpawnPlugin`, `DeepSpeedPlugin` ([10357](https://github.com/PyTorchLightning/pytorch-lightning/pull/10357))
- Removed deprecated property `is_slurm_managing_tasks` from AcceleratorConnector ([10353](https://github.com/PyTorchLightning/pytorch-lightning/pull/10353))
- Removed deprecated `LightningModule.log(tbptt_reduce_fx, tbptt_reduce_token, sync_dist_op)` ([10423](https://github.com/PyTorchLightning/pytorch-lightning/pull/10423))
- Removed deprecated `Plugin.task_idx` ([10441](https://github.com/PyTorchLightning/pytorch-lightning/pull/10441))
- Removed deprecated method `master_params` from PrecisionPlugin ([10372](https://github.com/PyTorchLightning/pytorch-lightning/pull/10372))
- Removed the automatic detachment of "extras" returned from `training_step`. For example, `return {'loss': ..., 'foo': foo.detach()}` will now be necessary if `foo` has gradients which you do not want to store ([10424](https://github.com/PyTorchLightning/pytorch-lightning/pull/10424))
- Removed deprecated passthrough methods and properties from `Accelerator` base class:
* ([10403](https://github.com/PyTorchLightning/pytorch-lightning/pull/10403))
* ([10448](https://github.com/PyTorchLightning/pytorch-lightning/pull/10448))
- Removed deprecated signature for `transfer_batch_to_device` hook. The new argument `dataloader_idx` is now required ([10480](https://github.com/PyTorchLightning/pytorch-lightning/pull/10480))
- Removed deprecated `utilities.distributed.rank_zero_{warn/deprecation}` ([10451](https://github.com/PyTorchLightning/pytorch-lightning/pull/10451))
- Removed deprecated `mode` argument from `ModelSummary` class ([10449](https://github.com/PyTorchLightning/pytorch-lightning/pull/10449))
- Removed deprecated `Trainer.train_loop` property in favor of `Trainer.fit_loop` ([10482](https://github.com/PyTorchLightning/pytorch-lightning/pull/10482))
- Removed deprecated `Trainer.train_loop` property in favor of `Trainer.fit_loop` ([10482](https://github.com/PyTorchLightning/pytorch-lightning/pull/10482))
- Removed deprecated `disable_validation` property from Trainer ([10450](https://github.com/PyTorchLightning/pytorch-lightning/pull/10450))
- Removed deprecated `CheckpointConnector.hpc_load` property in favor of `CheckpointConnector.restore` ([10525](https://github.com/PyTorchLightning/pytorch-lightning/pull/10525))
- Removed deprecated `reload_dataloaders_every_epoch` from `Trainer` in favour of `reload_dataloaders_every_n_epochs` ([10481](https://github.com/PyTorchLightning/pytorch-lightning/pull/10481))
- Removed the `precision_plugin` attribute from `Accelerator` in favor of its equivalent attribute `precision_plugin` in the `TrainingTypePlugin` ([10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))
- Removed `DeepSpeedPlugin.{precision,amp_type,amp_level}` properties ([10657](https://github.com/PyTorchLightning/pytorch-lightning/pull/10657))
- Removed patching of `on_before_batch_transfer`, `transfer_batch_to_device` and `on_after_batch_transfer` hooks in `LightningModule` ([10603](https://github.com/PyTorchLightning/pytorch-lightning/pull/10603))
- Removed argument `return_result` from the `DDPSpawnPlugin.spawn()` method ([10867](https://github.com/PyTorchLightning/pytorch-lightning/pull/10867))
- Removed the property `TrainingTypePlugin.results` and corresponding properties in subclasses ([10034](https://github.com/PyTorchLightning/pytorch-lightning/pull/10034))
- Removed the `mp_queue` attribute from `DDPSpawnPlugin` and `TPUSpawnPlugin` ([10034](https://github.com/PyTorchLightning/pytorch-lightning/pull/10034))
- Removed unnecessary `_move_optimizer_state` method overrides from `TPUSpawnPlugin` and `SingleTPUPlugin` ([10849](https://github.com/PyTorchLightning/pytorch-lightning/pull/10849))
- Removed `should_rank_save_checkpoint` property from `TrainingTypePlugin` ([11070](https://github.com/PyTorchLightning/pytorch-lightning/pull/11070))
- Removed `model_sharded_context` method from `Accelerator` ([10886](https://github.com/PyTorchLightning/pytorch-lightning/pull/10886))
- Removed method `pre_dispatch` from the `PrecisionPlugin` ([10887](https://github.com/PyTorchLightning/pytorch-lightning/pull/10887))
- Removed method `setup_optimizers_in_pre_dispatch` from the `strategies` and achieve the same logic in `setup` and `pre_dispatch` methods ([10906](https://github.com/PyTorchLightning/pytorch-lightning/pull/10906))
- Removed methods `pre_dispatch`, `dispatch` and `post_dispatch` from the `Accelerator` ([10885](https://github.com/PyTorchLightning/pytorch-lightning/pull/10885))
- Removed method `training_step`, `test_step`, `validation_step` and `predict_step` from the `Accelerator` ([10890](https://github.com/PyTorchLightning/pytorch-lightning/pull/10890))
- Removed `TrainingTypePlugin.start_{training,evaluating,predicting}` hooks and the same in all subclasses ([10989](https://github.com/PyTorchLightning/pytorch-lightning/pull/10989), [#10896](https://github.com/PyTorchLightning/pytorch-lightning/pull/10896))
- Removed `Accelerator.on_train_start` ([10999](https://github.com/PyTorchLightning/pytorch-lightning/pull/10999))
- Removed support for Python 3.6 ([11117](https://github.com/PyTorchLightning/pytorch-lightning/pull/11117))
- Removed `Strategy.init_optimizers` in favor of `Strategy.setup_optimizers` ([11236](https://github.com/PyTorchLightning/pytorch-lightning/pull/11236))
- Removed `profile("training_step_and_backward")` in `Closure` class since we already profile calls `training_step` and `backward` ([11222](https://github.com/PyTorchLightning/pytorch-lightning/pull/11222))
- Removed `Strategy.optimizer_zero_grad` ([11246](https://github.com/PyTorchLightning/pytorch-lightning/pull/11246))
- Removed `Strategy.on_gpu` ([11537](https://github.com/PyTorchLightning/pytorch-lightning/pull/11537))
- Removed `Strategy.on_tpu` property ([11536](https://github.com/PyTorchLightning/pytorch-lightning/pull/11536))
- Removed the abstract property `LightningLoggerBase.experiment` ([11603](https://github.com/PyTorchLightning/pytorch-lightning/pull/11603))
- Removed `FitLoop.current_epoch` getter and setter ([11562](https://github.com/PyTorchLightning/pytorch-lightning/pull/11562))
- Removed access to `_short_id` in `NeptuneLogger` ([11517](https://github.com/PyTorchLightning/pytorch-lightning/pull/11517))
- Removed `log_text` and `log_image` from the `LightningLoggerBase` API ([11857](https://github.com/PyTorchLightning/pytorch-lightning/pull/11857))
- Removed calls to `profile("model_forward")` in favor of profiling `training_step` ([12032](https://github.com/PyTorchLightning/pytorch-lightning/pull/12032))
- Removed `get_mp_spawn_kwargs` from `DDPSpawnStrategy` and `TPUSpawnStrategy` in favor of configuration in the `_SpawnLauncher` ([11966](https://github.com/PyTorchLightning/pytorch-lightning/pull/11966))
- Removed `_aggregate_metrics`, `_reduce_agg_metrics`, and `_finalize_agg_metrics` from `LightningLoggerBase` ([12053](https://github.com/PyTorchLightning/pytorch-lightning/pull/12053))
- Removed the `AcceleratorConnector.device_type` property ([12081](https://github.com/PyTorchLightning/pytorch-lightning/pull/12081))
- Removed `AcceleratorConnector.num_nodes` ([12107](https://github.com/PyTorchLightning/pytorch-lightning/pull/12107))
- Removed `AcceleratorConnector.has_ipu` property ([12111](https://github.com/PyTorchLightning/pytorch-lightning/pull/12111))
- Removed `AcceleratorConnector.use_ipu` property ([12110](https://github.com/PyTorchLightning/pytorch-lightning/pull/12110))
- Removed `AcceleratorConnector.has_tpu` property ([12109](https://github.com/PyTorchLightning/pytorch-lightning/pull/12109))
- Removed `AcceleratorConnector.use_dp` property ([12112](https://github.com/PyTorchLightning/pytorch-lightning/pull/12112))
- Removed `configure_sync_batchnorm` from `ParallelStrategy` and all other strategies that inherit from it ([11754](https://github.com/PyTorchLightning/pytorch-lightning/pull/11754))
- Removed public attribute `sync_batchnorm` from strategies ([11754](https://github.com/PyTorchLightning/pytorch-lightning/pull/11754))
- Removed `AcceleratorConnector.root_gpu` property ([12262](https://github.com/PyTorchLightning/pytorch-lightning/pull/12262))
- Removed `AcceleratorConnector.tpu_id` property ([12387](https://github.com/PyTorchLightning/pytorch-lightning/pull/12387))
- Removed `AcceleratorConnector.num_gpus` property ([12384](https://github.com/PyTorchLightning/pytorch-lightning/pull/12384))
- Removed `AcceleratorConnector.num_ipus` property ([12386](https://github.com/PyTorchLightning/pytorch-lightning/pull/12386))
- Removed `AcceleratorConnector.num_processes` property ([12388](https://github.com/PyTorchLightning/pytorch-lightning/pull/12388))
- Removed `AcceleratorConnector.parallel_device_ids` property ([12072](https://github.com/PyTorchLightning/pytorch-lightning/pull/12072))
- Removed `AcceleratorConnector.devices` property ([12435](https://github.com/PyTorchLightning/pytorch-lightning/pull/12435))
- Removed `AcceleratorConnector.parallel_devices` property ([12075](https://github.com/PyTorchLightning/pytorch-lightning/pull/12075))
- Removed `AcceleratorConnector.tpu_cores` property ([12437](https://github.com/PyTorchLightning/pytorch-lightning/pull/12437))
</details>
<details><summary>Fixed</summary>
- Fixed an issue where `ModelCheckpoint` could delete last checkpoint from the old directory when `dirpath` has changed during resumed training ([12225](https://github.com/PyTorchLightning/pytorch-lightning/pull/12225))
- Fixed an issue where `ModelCheckpoint` could delete older checkpoints when `dirpath` has changed during resumed training ([12045](https://github.com/PyTorchLightning/pytorch-lightning/pull/12045))
- Fixed an issue where `HorovodStrategy.teardown()` did not complete gracefully if an exception was thrown during callback setup [11752](https://github.com/PyTorchLightning/pytorch-lightning/pull/11752)
- Fixed security vulnerabilities CVE-2020-1747 and CVE-2020-14343 caused by the `PyYAML` dependency ([11099](https://github.com/PyTorchLightning/pytorch-lightning/pull/11099))
- Fixed security vulnerability "CWE-94: Improper Control of Generation of Code (Code Injection)" ([12212](https://github.com/PyTorchLightning/pytorch-lightning/pull/12212))
- Fixed logging on `{test,validation}_epoch_end` with multiple dataloaders ([11132](https://github.com/PyTorchLightning/pytorch-lightning/pull/11132))
- Reset the validation progress tracking state after sanity checking ([11218](https://github.com/PyTorchLightning/pytorch-lightning/pull/11218))
- Fixed double evaluation bug with fault-tolerance enabled where the second call was completely skipped ([11119](https://github.com/PyTorchLightning/pytorch-lightning/pull/11119))
- Fixed an issue with the `TPUSpawnPlugin` handling the `XLA_USE_BF16` environment variable incorrectly ([10990](https://github.com/PyTorchLightning/pytorch-lightning/pull/10990))
- Fixed wrong typehint for `Trainer.lightning_optimizers` ([11155](https://github.com/PyTorchLightning/pytorch-lightning/pull/11155))
- Fixed the lr-scheduler state not being dumped to checkpoint when using the deepspeed strategy ([11307](https://github.com/PyTorchLightning/pytorch-lightning/pull/11307))
- Fixed bug that forced overriding `configure_optimizers` with the CLI ([11672](https://github.com/PyTorchLightning/pytorch-lightning/pull/11672))
- Fixed type promotion when tensors of higher category than float are logged ([11401](https://github.com/PyTorchLightning/pytorch-lightning/pull/11401))
- Fixed `SimpleProfiler` summary ([11414](https://github.com/PyTorchLightning/pytorch-lightning/pull/11414))
- No longer set a `DistributedSampler` to the `poptorch.DataLoader` when IPUs are used ([12114](https://github.com/PyTorchLightning/pytorch-lightning/pull/12114))
- Fixed bug where progress bar was not being disabled when not in rank zero during predict ([11377](https://github.com/PyTorchLightning/pytorch-lightning/pull/11377))
- Fixed the mid-epoch warning call while resuming training ([11556](https://github.com/PyTorchLightning/pytorch-lightning/pull/11556))
- Fixed `LightningModule.{un,}toggle_model` when only 1 optimizer is used ([12088](https://github.com/PyTorchLightning/pytorch-lightning/pull/12088))
- Fixed an issue in `RichProgressbar` to display the metrics logged only on main progress bar ([11690](https://github.com/PyTorchLightning/pytorch-lightning/pull/11690))
- Fixed `RichProgressBar` progress when refresh rate does not evenly divide the total counter ([11668](https://github.com/PyTorchLightning/pytorch-lightning/pull/11668))
- Fixed `RichProgressBar` progress validation bar total when using multiple validation runs within a single training epoch ([11668](https://github.com/PyTorchLightning/pytorch-lightning/pull/11668))
- Configure native Deepspeed schedulers with interval='step' ([11788](https://github.com/PyTorchLightning/pytorch-lightning/pull/11788)), ([#12031](https://github.com/PyTorchLightning/pytorch-lightning/pull/12031))
- Update `RichProgressBarTheme` styles after detecting light theme on colab ([10993](https://github.com/PyTorchLightning/pytorch-lightning/pull/10993))
- Fixed passing `_ddp_params_and_buffers_to_ignore` ([11949](https://github.com/PyTorchLightning/pytorch-lightning/pull/11949))
- Fixed an `AttributeError` when calling `save_hyperparameters` and no parameters need saving ([11827](https://github.com/PyTorchLightning/pytorch-lightning/pull/11827))
- Fixed environment variable priority for global rank determination ([11406](https://github.com/PyTorchLightning/pytorch-lightning/pull/11406))
- Fixed an issue that caused the Trainer to produce identical results on subsequent runs without explicit re-seeding ([11870](https://github.com/PyTorchLightning/pytorch-lightning/pull/11870))
- Fixed an issue that caused the Tuner to affect the random state ([11870](https://github.com/PyTorchLightning/pytorch-lightning/pull/11870))
- Fixed to avoid common hook warning if no hook is overridden ([12131](https://github.com/PyTorchLightning/pytorch-lightning/pull/12131))
- Fixed deepspeed keeping old sub-folders in same ckpt path ([12194](https://github.com/PyTorchLightning/pytorch-lightning/pull/12194))
- Fixed returning logged metrics instead of callback metrics during evaluation ([12224](https://github.com/PyTorchLightning/pytorch-lightning/pull/12224))
- Fixed the case where `logger=None` is passed to the Trainer ([12249](https://github.com/PyTorchLightning/pytorch-lightning/pull/12249))
- Fixed bug where the global step tracked by `ModelCheckpoint` was still set even if no checkpoint was saved ([12418](https://github.com/PyTorchLightning/pytorch-lightning/pull/12418))
- Fixed bug where `ModelCheckpoint` was overriding the `epoch` and `step` logged values ([12418](https://github.com/PyTorchLightning/pytorch-lightning/pull/12418))
- Fixed bug where monitoring the default `epoch` and `step` values with `ModelCheckpoint` would fail ([12418](https://github.com/PyTorchLightning/pytorch-lightning/pull/12418))
- Fixed initializing optimizers unnecessarily in `DDPFullyShardedStrategy` ([12267](https://github.com/PyTorchLightning/pytorch-lightning/pull/12267))
- Fixed check for horovod module ([12377](https://github.com/PyTorchLightning/pytorch-lightning/pull/12377))
- Fixed logging to loggers with multiple eval dataloaders ([12454](https://github.com/PyTorchLightning/pytorch-lightning/pull/12454))
- Fixed an issue with resuming from a checkpoint trained with QAT ([11346](https://github.com/PyTorchLightning/pytorch-lightning/pull/11346))
</details>
**Full commit list**: https://github.com/PyTorchLightning/pytorch-lightning/compare/1.5.0...1.6.0
<a name="contributors"></a>
Contributors
Veteran
akihironitta ananthsub awaelchli Borda borisdayma carmocca daniellepintz edward-io ethanwharris four4fish jjenniferdai kaushikb11 kingyiusuen kragniz mauvilsa ninginthecloud popfido rohitgr7 SeanNaren speediedan tchaton tshu-w twsl williamFalcon
New
a-gardner1 abhi-rf abhinavarora adamreeve adamviola AJSVB akashkw amin-nejad AndresAlgaba ant0nsc armanal bhadreshpsavani CAIQT catalys1 chaddy1004 chunyang-wen circlecrystal Code-Cornelius Cyber-Machine dennisbappert DuYicong515 edpizzi franp9am ftorres16 ggare-cmu guyang3532 Honzys idiomaticrefactoring isvogor-foi jerome-habana jgibson2 jlhbaseball15 jona-0 JoostvDoorn josafatburmeister konstantinjdobler Kr4is krishnakalyan3 krshrimali lemairecarl lucmos manangoel99 mathemusician mayeroa mbortolon97 NathanGodey Nesqulck nithinraok ORippler os1ma peterdudfield Piyush-97 puhuk qqueing quancs Raahul-Singh Raalsky Rajathbharadwaj rasbt rharish101 rhjohnstone rjkilpatrick RobertLaurella roschly rsokl rusty1s SauravMaheshkar sethvargo shabie shivammehta007 srb-cv ThomVett wangraying whokilleddb zredeaux65
_If we forgot someone or have any suggestion, let us know in [Slack](https://join.slack.com/t/pytorch-lightning/shared_invite/zt-12iz3cds1-uyyyBYJLiaL2bqVmMN7n~A) :zap:_