Fault-tolerant Training
[Fault-tolerant Training](https://pytorch-lightning.readthedocs.io/en/1.5.0/advanced/fault_tolerant_training.html) is a new internal mechanism that enables PyTorch Lightning to recover from a hardware or software failure. This is particularly interesting while training in the cloud with preemptive instances which can shutdown at any time. Once a Lightning experiment unexpectedly exits, a temporary checkpoint is saved that contains the exact state of all loops and the model. With this new experimental feature, you will be able to restore your training mid-epoch on the exact batch and continue training as if it never got interrupted.
bash
PL_FAULT_TOLERANT_TRAINING=1 python train.py
LightningLite
LightningLite enables pure PyTorch users to scale their existing code to any kind of hardware while retaining full control over their own loops and optimization logic.
With just a few lines of code and no large refactoring, you get support for multi-device, multi-node, running on different accelerators (CPU, GPU, TPU), native automatic mixed precision (`half` and `bfloat16`), and double precision, in just a few seconds. And no special launcher required! Check out our [documentation](https://pytorch-lightning.readthedocs.io/en/1.5.0/starter/lightning_lite.html) to find out how you can get one step closer to boilerplate-free research!
python
class Lite(LightningLite):
def run(self):
Let Lite setup your dataloader(s)
train_loader = self.setup_dataloaders(torch.utils.data.DataLoader(...))
model = Net() .to() not needed
optimizer = optim.Adam(model.parameters())
Let Lite setup your model and optimizer
model, optimizer = self.setup(model, optimizer)
for epoch in range(5):
for data, target in train_loader:
optimizer.zero_grad()
output = model(data) data is already on the device
loss = F.nll_loss(output, target)
self.backward(loss) instead of loss.backward()
optimizer.step()
Lite(accelerator="gpu", devices="auto").run()
Loop Customization
The new Loop API lets advanced users swap out the default gradient descent optimization loop at the core of Lightning with a different optimization paradigm. This is part of our effort to make Lightning the simplest, most flexible framework to take any kind of deep learning research to production.
[Read our comprehensive introduction to loops](https://pytorch-lightning.readthedocs.io/en/1.5.0/extensions/loops.html?highlight=loops)
New Rich Progress Bar
We integrated with [Rich](https://github.com/willmcgugan/rich) and created a new and improved progress bar for Lightning.
Try it out:
bash
pip install rich
python
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import RichProgressBar
trainer = Trainer(callbacks=[RichProgressBar()])
<a name="strategy-and-devices"></a>
New Trainer Arguments: Strategy and Devices
With the new strategy and devices arguments in the Trainer, it is now easer to switch from one hardware to another.
<div align="center">
| Before | After |
|-------------------------------------------------------|-----------------------------------------------------------------|
| `Trainer(accelerator="ddp", gpus=2)` | `Trainer(accelerator="gpu", devices=2, strategy="ddp")` |
| `Trainer(accelerator="ddp_cpu", num_processes=2)` | `Trainer(accelerator="cpu", devices=2, strategy="ddp")` |
| `Trainer(accelerator="tpu_spawn", tpu_cores=8)` | `Trainer(accelerator="tpu", devices=8)` |
</div>
The new `devices` argument is now agnostic to all accelerators, but the previous arguments `gpus`, `tpu_cores`, `ipus` are still available and work the same as before. In addition, it is now also possible to set `devices="auto"` or `accelerator="auto"` to select the best accelerator available on the hardware.
python
from pytorch_lightning import Trainer
trainer = Trainer(accelerator="auto", devices="auto")
LightningCLI V2
This release adds support for running not just `Trainer.fit` but any of the `Trainer` entry points!
bash
python script.py fit
python script.py test
LightningCLI now supports registries for callbacks, optimizers, learning rate schedulers, LightningModules and LightningDataModules. This greatly improves the command line experience as only the class names and arguments are required as follows:
bash
python script.py \
--trainer.callbacks=EarlyStopping \
--trainer.callbacks.patience=5 \
--trainer.callbacks.LearningRateMonitor \
--trainer.callbacks.logging_interval=epoch \
--optimizer=Adam \
--optimizer.lr=0.01 \
--lr_scheduler=OneCycleLR \
--lr_scheduler=anneal_strategy=linear
We've also added support for a manual mode where the CLI takes care of the instantiation but you have control over the `Trainer` calls:
python
cli = LightningCLI(MyModel, run=False)
cli.trainer.fit(cli.model)
[Try out LightninCLI!](https://pytorch-lightning.readthedocs.io/en/1.5.0/common/lightning_cli.html)
CheckpointIO Plugins
As part of our commitment to extensibility, we have abstracted the checkpointing logic into a [CheckpointIO](https://pytorch-lightning.readthedocs.io/en/1.5.0/api/pytorch_lightning.plugins.io.CheckpointIO.html?highlight=checkpointio) plugin. This enables users to adapt Lightning to their own infrastructure.
python
from pytorch_lightning.plugins import CheckpointIO
class CustomCheckpointIO(CheckpointIO):
def save_checkpoint(self, checkpoint, path):
put all logic related to saving a checkpoint here
def load_checkpoint(self, path):
put all logic related to loading a checkpoint here
def remove_checkpoint(self, path):
put all logic related to deleting a checkpoint here
BFloat16 Support
PyTorch 1.10 introduces native Automatic Mixed Precision (AMP) support for `torch.bfloat16` on CPU (was already supported for TPUs), enabling higher performance compared with `torch.float16`. Switch to bfloat16 training by setting the argument:
python
from pytorch_lightning import Trainer
trainer = Trainer(precision="bf16")
Enable Auto Parameters Tying
It is pretty common to share parameters within a model. However, TPUs don't retain shared parameters once moved on the devices. Lightning now supports automatic detection and re-assignement to alleviate this problem from TPUs.
Infinite Training
Infinite training is now supported by setting `Trainer(max_epochs=-1)` for an unlimited number of epochs, or `Trainer(max_steps=-1)` for an endless epoch.
> Note: you will want to avoid logging with `on_epoch=True` in case of `max_steps=-1`.
DeepSpeed Stage 1
DeepSpeed is a deep learning training optimization library, providing the means to train massive billion parameter models at scale. Lightning now also supports the DeepSpeed ZeRO Stage 1 protocol that partitions your optimizer states across your GPUs to reduce memory.
python
from pytorch_lightning import Trainer
trainer = Trainer(gpus=4, strategy="deepspeed_stage_1", precision=16)
trainer.fit(model)
For even more memory savings and model sharding advice, check out stage 2 & 3 as well in our [multi-GPU docs](https://pytorch-lightning.readthedocs.io/en/1.5.0/advanced/advanced_gpu.html#deepspeed).
Gradient Clipping Customization
By overriding the `LightningModule.configure_gradient_clipping` hook, you can customize gradient clipping to your needs:
python
Perform gradient clipping on gradients associated with discriminator (optimizer_idx=1) in GAN
def configure_gradient_clipping(
self,
optimizer,
optimizer_idx,
gradient_clip_val,
gradient_clip_algorithm
):
if optimizer_idx == 1:
Lightning will handle the gradient clipping
self.clip_gradients(
optimizer,
gradient_clip_val=gradient_clip_val,
gradient_clip_algorithm=gradient_clip_algorithm
)
This means you can now implement state-of-the-art clipping algorithms with Lightning!
Determinism
Added support for `torch.use_deterministic_algorithms`. Read more about how it works [here](https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html). You can enable it by setting:
python
from pytorch_lightning import Trainer
trainer = Trainer(deterministic=True)
Anomaly Detection
Lightning makes it easier to debug your code, so we've added support for `torch.set_detect_anomaly`. With this, PyTorch detects numerical anomalies like NaN or inf during forward and backward. Read more about anomaly detection [here](https://pytorch.org/docs/stable/autograd.html)
python
from pytorch_lightning import Trainer
trainer = Trainer(detect_anomaly=True)
DDP Debugging Improvements
Are you having a hard time debugging DDP on your remote machine? Now you can debug DDP locally on the CPU:
python
trainer = Trainer(accelerator="cpu", strategy="ddp", devices=2)
When everything works, switch back to GPU by changing only the `accelerator`. Check our documentation for more [useful debugging tricks](https://pytorch-lightning.readthedocs.io/en/1.5.0/common/debugging.html).
Note that this will not provide any speed benefits.
ModelSummary Callback
Generates a summary of all layers in a LightningModule. This currently works with the new `RichProgressBar` callback.
python
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelSummary
trainer = Trainer(callbacks=[ModelSummary(max_depth=1)])
New Hooks
An `on_exception` Callback hook has been added which allows the user to perform custom exception handling.
python
class MyCallback(Callback):
def on_exception(self, trainer, pl_module, exception):
whatever you want!
...
Experimental Features
Inter Batch Parallelism
The inter-batch parallelism feature aims at hiding the latency of host-to-device copy of input batches behind computationally intensive operations. In some use case, it can provide training speed up. This feature is experimental and subject to change, hence opt-in through an environment variable.
bash
PL_INTER_BATCH_PARALLELISM=1 python train.py
Training Step With DataLoader Iterator
If your `training_step` signature takes a `dataloader_iter`, Lightning would pass it directly. This can be useful for recommendation engine optimization.
Meta Module
PyTorch 1.10 introduces the meta tensors, tensors without the data. In this continuation, PyTorch Lightning provides an `init_meta_context` context manager and `materialize_module` function to handle large sharded models.
<a name="bc-changes"></a>
Backward Incompatible Changes
Here is a selection of important changes that are not backward compatible with versions < 1.5. The full list of changes and removals are listed in the changelog at the bottom.
Parsing of GPU Argument
The interpretation of the `gpus` Trainer argument when provided as a string has changed: `Trainer(gpus="n")` (string) no longer selects the GPU index n and instead selects the first n devices. In order to preserve the old behavior, you will have to change your code to `Trainer(gpus=[n])` (list of indices) or `Trainer(gpus="n,")` (string with comma separated indices).
Distributed Backend
The argument `distributed_backend` has been removed from the `Trainer` in favor of the new `accelerator` and `strategy` arguments ([10017](https://github.com/PyTorchLightning/pytorch-lightning/pull/10017)).
python
BEFORE
trainer = Trainer(distributed_backend="ddp_spawn", gpus=2)
NOW
trainer = Trainer(strategy="ddp_spawn", accelerator="gpu", devices=2)
Trainer Argument Defaults
- The default value of the `max_steps` Trainer argument has changed from `None` to -1 ([9460](https://github.com/PyTorchLightning/pytorch-lightning/pull/9460)). You can no longer specify `Trainer(max_steps=None)` and if you did, you need to change the code to `Trainer(max_steps=-1)`.
- The default value of `accumulate_grad_batches` has changed from 1 to `None` ([9652](https://github.com/PyTorchLightning/pytorch-lightning/pull/9652)).
Loading Model Weights
The model weights now get loaded in all cases when the checkpoint path is provided in `Trainer.{validate,test,predict}`, regardless of whether the model instance is provided or not.
python
model reference provided:
trainer.test(model, ckpt_path=None) use provided model
trainer.test(model, ckpt_path="best") load best model
trainer.test(model, ckpt_path="my_path") load path
model reference not provided
trainer.fit(model)
trainer.test(ckpt_path=None) load best model (NEW BEHAVIOR!)
trainer.test(ckpt_path="my_path") load path (NEW BEHAVIOR!)
Users who relied on `trainer.test(ckpt_path=None)` to load the latest model need to change their code to `trainer.test(model)` and pass the model reference directly.
Lightning CLI
All CLI commands now need to include the Trainer method to run as the first command, i.e., one of `fit`, `validate`, `test`, `predict`.
python
BEFORE
python script.py --trainer.max_epochs=123
NOW
python script.py fit --trainer.max_epochs=123
For questions and help regarding CLI, join our [Lightning-CLI Slack channel](https://pytorch-lightning.slack.com/archives/C01URG3M74L).
Optimizer Hooks
- Executing the `optimizer_closure` is now required when overriding the `optimizer_step` hook ([9360](https://github.com/PyTorchLightning/pytorch-lightning/pull/9360)). If you relied on the previous behavior, we recommend to switch to Manual Optimization alltogether.
- The `on_before_optimizer_step` hook previously ran before the entire optimization closure, including backward. This was unintended behavior and if you rely on this, move your code to the new on_before_backward` hook.
Changes in Accelerators and Plugins
Changes in Accelerators and Plugins were made without deprecation due to their experimental state. The API is expected to become stable in 1.6.
Removed attributes and methods:
- `Accelerator.{call_configure_sharded_model_hook, connect_training_type_plugin, connect_precision_plugin, on_reset_*_dataloader, on_train_epoch_end, on_save, post_optimizer_step, update_global_step}`
- `TrainingTypePlugin.{call_configure_sharded_model_hook, on_reset_*_dataloader, on_save, post_optimizer_step, update_global_step}`
- `PrecisionPlugin.{post_optimizer_step}`
- `ParallelPlugin.teardown`
Changed signatures:
- The accelerator and training type plugin `setup` hooks no longer have a `model` argument.
Other changes:
- The base `Plugin` class has been removed.
- `HorovodPlugin.all_gather` now returns a `torch.Tensor` instead of a list.
- The LightningModule no longer gets wrapped with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin`.
<a name="changelog"></a>
Full Changelog
Added
- Added support for monitoring the learning rate without schedulers in `LearningRateMonitor` ([9786](https://github.com/PyTorchLightning/pytorch-lightning/issues/9786))
- Added registration of `ShardedTensor` state dict hooks in `LightningModule.__init__` if the PyTorch version supports `ShardedTensor` ([8944](https://github.com/PyTorchLightning/pytorch-lightning/pull/8944))
- Added error handling including calling of `on_keyboard_interrupt()` and `on_exception()` for all entrypoints (fit, validate, test, predict) ([8819](https://github.com/PyTorchLightning/pytorch-lightning/pull/8819))
- Added a flavor of `training_step` that takes `dataloader_iter` as an argument ([8807](https://github.com/PyTorchLightning/pytorch-lightning/pull/8807))
- Added a `state_key` property to the `Callback` base class ([6886](https://github.com/PyTorchLightning/pytorch-lightning/pull/6886))
- Added progress tracking to loops:
* Integrated `TrainingEpochLoop.total_batch_idx` ([8598](https://github.com/PyTorchLightning/pytorch-lightning/pull/8598))
* Added `BatchProgress` and integrated `TrainingEpochLoop.is_last_batch` ([9657](https://github.com/PyTorchLightning/pytorch-lightning/pull/9657))
* Avoid optional `Tracker` attributes ([9320](https://github.com/PyTorchLightning/pytorch-lightning/pull/9320))
* Reset `current` progress counters when restarting an epoch loop that had already finished ([9371](https://github.com/PyTorchLightning/pytorch-lightning/pull/9371))
* Call `reset_on_restart` in the loop's `reset` hook instead of when loading a checkpoint ([9561](https://github.com/PyTorchLightning/pytorch-lightning/pull/9561))
* Use `completed` over `processed` in `reset_on_restart` ([9656](https://github.com/PyTorchLightning/pytorch-lightning/pull/9656))
* Renamed `reset_on_epoch` to `reset_on_run` ([9658](https://github.com/PyTorchLightning/pytorch-lightning/pull/9658))
- Added `batch_size` and `rank_zero_only` arguments for `log_dict` to match `log` ([8628](https://github.com/PyTorchLightning/pytorch-lightning/pull/8628))
- Added a check for unique GPU ids ([8666](https://github.com/PyTorchLightning/pytorch-lightning/pull/8666))
- Added `ResultCollection` state_dict to the Loop `state_dict` and added support for distributed reload ([8641](https://github.com/PyTorchLightning/pytorch-lightning/pull/8641))
- Added DeepSpeed collate checkpoint utility function ([8701](https://github.com/PyTorchLightning/pytorch-lightning/pull/8701))
- Added a `handles_accumulate_grad_batches` property to the training type plugins ([8856](https://github.com/PyTorchLightning/pytorch-lightning/pull/8856))
- Added a warning to `WandbLogger` when reusing a wandb run ([8714](https://github.com/PyTorchLightning/pytorch-lightning/pull/8714))
- Added `log_graph` argument for `watch` method of `WandbLogger` ([8662](https://github.com/PyTorchLightning/pytorch-lightning/pull/8662))
- `LightningCLI` additions:
* Added `LightningCLI(run=False|True)` to choose whether to run a `Trainer` subcommand ([8751](https://github.com/PyTorchLightning/pytorch-lightning/pull/8751))
* Added support to call any trainer function from the `LightningCLI` via subcommands ([7508](https://github.com/PyTorchLightning/pytorch-lightning/pull/7508))
* Allow easy trainer re-instantiation ([7508](https://github.com/PyTorchLightning/pytorch-lightning/pull/9241))
* Automatically register all optimizers and learning rate schedulers ([9565](https://github.com/PyTorchLightning/pytorch-lightning/pull/9565))
* Allow registering custom optimizers and learning rate schedulers without subclassing the CLI ([9565](https://github.com/PyTorchLightning/pytorch-lightning/pull/9565))
* Support shorthand notation to instantiate optimizers and learning rate schedulers ([9565](https://github.com/PyTorchLightning/pytorch-lightning/pull/9565))
* Support passing lists of callbacks via command line ([8815](https://github.com/PyTorchLightning/pytorch-lightning/pull/8815))
* Support shorthand notation to instantiate models ([9588](https://github.com/PyTorchLightning/pytorch-lightning/pull/9588))
* Support shorthand notation to instantiate datamodules ([10011](https://github.com/PyTorchLightning/pytorch-lightning/pull/10011))
* Added `multifile` option to `LightningCLI` to enable/disable config saving to preserve multiple files structure ([9073](https://github.com/PyTorchLightning/pytorch-lightning/pull/9073))
- Fault-tolerant training:
* Added `FastForwardSampler` and `CaptureIterableDataset` injection to data loading utilities ([8366](https://github.com/PyTorchLightning/pytorch-lightning/pull/8366))
* Added `DataFetcher` to control fetching flow ([8890](https://github.com/PyTorchLightning/pytorch-lightning/pull/8890))
* Added `SharedCycleIteratorState` to prevent infinite loop ([8889](https://github.com/PyTorchLightning/pytorch-lightning/pull/8889))
* Added `CaptureMapDataset` for state management in map-style datasets ([8891](https://github.com/PyTorchLightning/pytorch-lightning/pull/8891))
* Added Fault Tolerant Training to `DataFetcher` ([8891](https://github.com/PyTorchLightning/pytorch-lightning/pull/8891))
* Replaced old prefetch iterator with new `DataFetcher` in training loop ([8953](https://github.com/PyTorchLightning/pytorch-lightning/pull/8953))
* Added partial support for global random state fault-tolerance in map-style datasets ([8950](https://github.com/PyTorchLightning/pytorch-lightning/pull/8950))
* Converted state to tuple explicitly when setting Python random state ([9401](https://github.com/PyTorchLightning/pytorch-lightning/pull/9401))
* Added support for restarting an optimizer loop (multiple optimizers) ([9537](https://github.com/PyTorchLightning/pytorch-lightning/pull/9537))
* Added support for restarting within Evaluation Loop ([9563](https://github.com/PyTorchLightning/pytorch-lightning/pull/9563))
* Added mechanism to detect that a signal has been sent so the Trainer can gracefully exit ([9566](https://github.com/PyTorchLightning/pytorch-lightning/pull/9566))
* Added support for skipping ahead to validation during the auto-restart of fitting ([9681](https://github.com/PyTorchLightning/pytorch-lightning/pull/9681))
* Added support for auto-restart if a fault-tolerant checkpoint is available ([9722](https://github.com/PyTorchLightning/pytorch-lightning/pull/9722))
- Checkpoint saving and loading extensibility:
* Added `CheckpointIO` plugin to expose checkpoint IO from training type plugin ([8743](https://github.com/PyTorchLightning/pytorch-lightning/pull/8743))
* Refactored `CheckpointConnector` to offload validation logic to the `CheckpointIO` plugin ([9045](https://github.com/PyTorchLightning/pytorch-lightning/pull/9045))
* Added `remove_checkpoint` to `CheckpointIO` plugin by moving the responsibility out of the `ModelCheckpoint` callback ([9373](https://github.com/PyTorchLightning/pytorch-lightning/pull/9373))
* Added `XLACheckpointIO` plugin ([9972](https://github.com/PyTorchLightning/pytorch-lightning/pull/9972))
- Loop customization:
* Added `Closure` and `AbstractClosure` classes ([8642](https://github.com/PyTorchLightning/pytorch-lightning/pull/8642))
* Refactored `TrainingBatchLoop` and extracted `OptimizerLoop`, splitting off automatic optimization into its own loop ([9191](https://github.com/PyTorchLightning/pytorch-lightning/pull/9191))
* Removed `TrainingBatchLoop.backward()`; manual optimization now calls directly into `Accelerator.backward()` and automatic optimization handles backward in new `OptimizerLoop` ([9265](https://github.com/PyTorchLightning/pytorch-lightning/pull/9265))
* Extracted `ManualOptimization` logic from `TrainingBatchLoop` into its own separate loop class ([9266](https://github.com/PyTorchLightning/pytorch-lightning/pull/9266))
* Added `OutputResult` and `ManualResult` classes ([9437](https://github.com/PyTorchLightning/pytorch-lightning/pull/9437), [#9424](https://github.com/PyTorchLightning/pytorch-lightning/pull/9424))
* Marked `OptimizerLoop.backward` as protected ([9514](https://github.com/PyTorchLightning/pytorch-lightning/pull/9514))
* Marked `FitLoop.should_accumulate` as protected ([9515](https://github.com/PyTorchLightning/pytorch-lightning/pull/9515))
* Marked several methods in `PredictionLoop` as protected: `on_predict_start`, `on_predict_epoch_end`, `on_predict_end`, `on_predict_model_eval` ([9516](https://github.com/PyTorchLightning/pytorch-lightning/pull/9516))
* Marked several methods in `EvaluationLoop` as protected: `get_max_batches`, `on_evaluation_model_eval`, `on_evaluation_model_train`, `on_evaluation_start`, `on_evaluation_epoch_start`, `on_evaluation_epoch_end`, `on_evaluation_end`, `reload_evaluation_dataloaders` ([9516](https://github.com/PyTorchLightning/pytorch-lightning/pull/9516))
* Marked several methods in `EvaluationEpochLoop` as protected: `on_evaluation_batch_start`, `evaluation_step`, `evaluation_step_end` ([9516](https://github.com/PyTorchLightning/pytorch-lightning/pull/9516))
* Added `yielding_training_step` example ([9983](https://github.com/PyTorchLightning/pytorch-lightning/pull/9983))
- Added support for saving and loading state of multiple callbacks of the same type ([7187](https://github.com/PyTorchLightning/pytorch-lightning/pull/7187))
- Added DeepSpeed Stage 1 support ([8974](https://github.com/PyTorchLightning/pytorch-lightning/pull/8974))
- Added `Python dataclass` support for `LightningDataModule` ([8272](https://github.com/PyTorchLightning/pytorch-lightning/issues/8272))
- Added sanitization of tensors when they get logged as hyperparameters in `TensorBoardLogger` ([9031](https://github.com/PyTorchLightning/pytorch-lightning/pull/9031))
- Added `InterBatchParallelDataFetcher` ([9020](https://github.com/PyTorchLightning/pytorch-lightning/pull/9020))
- Added `DataLoaderIterDataFetcher` ([9020](https://github.com/PyTorchLightning/pytorch-lightning/pull/9020))
- Added `DataFetcher` within `Fit / Evaluation` Loop ([9047](https://github.com/PyTorchLightning/pytorch-lightning/pull/9047))
- Added a friendly error message when DDP attempts to spawn new distributed processes with rank > 0 ([9005](https://github.com/PyTorchLightning/pytorch-lightning/pull/9005))
- Added Rich integration:
* Added Rich progress bar ([8929](https://github.com/PyTorchLightning/pytorch-lightning/pull/8929), [#9559](https://github.com/PyTorchLightning/pytorch-lightning/pull/9559))
* Added Support for iterable datasets ([9734](https://github.com/PyTorchLightning/pytorch-lightning/pull/9734))
* Added `RichModelSummary` callback ([9546](https://github.com/PyTorchLightning/pytorch-lightning/pull/9546))
* Added `configure_columns` method to `RichProgressBar` ([10288](https://github.com/PyTorchLightning/pytorch-lightning/pull/10288))
* Added `leave` argument to `RichProgressBar` ([10301](https://github.com/PyTorchLightning/pytorch-lightning/pull/10301))
- Added input validation logic for precision ([9080](https://github.com/PyTorchLightning/pytorch-lightning/pull/9080))
- Added support for CPU AMP autocast ([9084](https://github.com/PyTorchLightning/pytorch-lightning/pull/9084))
- Added `on_exception` callback hook ([9183](https://github.com/PyTorchLightning/pytorch-lightning/pull/9183))
- Added a warning to DeepSpeed when inferring batch size ([9221](https://github.com/PyTorchLightning/pytorch-lightning/pull/9221))
- Added `ModelSummary` callback ([9344](https://github.com/PyTorchLightning/pytorch-lightning/pull/9344))
- Added `log_images`, `log_text` and `log_table` to `WandbLogger` ([9545](https://github.com/PyTorchLightning/pytorch-lightning/pull/9545))
- Added `PL_RECONCILE_PROCESS` environment variable to enable process reconciliation regardless of cluster environment settings ([9389](https://github.com/PyTorchLightning/pytorch-lightning/pull/9389))
- Added `get_device_stats` to the Accelerator interface and added its implementation for GPU and TPU ([9586](https://github.com/PyTorchLightning/pytorch-lightning/pull/9586))
- Added a warning when an unknown key is encountered in the optimizer configuration, and when `OneCycleLR` is used with `"interval": "epoch"` ([9666](https://github.com/PyTorchLightning/pytorch-lightning/pull/9666))
- Added `DeviceStatsMonitor` callback ([9712](https://github.com/PyTorchLightning/pytorch-lightning/pull/9712))
- Added `enable_progress_bar` to the Trainer constructor ([9664](https://github.com/PyTorchLightning/pytorch-lightning/pull/9664))
- Added `pl_legacy_patch` load utility for loading old checkpoints that have pickled legacy Lightning attributes ([9166](https://github.com/PyTorchLightning/pytorch-lightning/pull/9166))
- Added support for `torch.use_deterministic_algorithms` ([9121](https://github.com/PyTorchLightning/pytorch-lightning/pull/9121))
- Added automatic parameters tying for TPUs ([9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
- Added support for `torch.autograd.set_detect_anomaly` through `Trainer` constructor argument `detect_anomaly` ([9848](https://github.com/PyTorchLightning/pytorch-lightning/pull/9848))
- Added `enable_model_summary` flag to Trainer ([9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
- Added `strategy` argument to Trainer ([8597](https://github.com/PyTorchLightning/pytorch-lightning/pull/8597))
- Added `init_meta_context`, `materialize_module` utilities ([9920](https://github.com/PyTorchLightning/pytorch-lightning/pull/9920))
- Added `TPUPrecisionPlugin` ([10020](https://github.com/PyTorchLightning/pytorch-lightning/pull/#10020))
- Added `torch.bfloat16` support:
* Added bfloat16 support for Lightning Trainer ([9049](https://github.com/PyTorchLightning/pytorch-lightning/pull/9049))
* Renamed `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` ([10026](https://github.com/PyTorchLightning/pytorch-lightning/pull/10026))
* Default to `precision=bf16` on CPU when `precision=16` is passed ([10033](https://github.com/PyTorchLightning/pytorch-lightning/pull/10033))
* Added support for `torch.autocast` ([10053](https://github.com/PyTorchLightning/pytorch-lightning/pull/10053))
- Added `kfold` example for loop customization ([9965](https://github.com/PyTorchLightning/pytorch-lightning/pull/9965))
- LightningLite:
* Added `PrecisionPlugin.forward_context`, making it the default implementation for all `{train,val,test,predict}_step_context()` methods ([9988](https://github.com/PyTorchLightning/pytorch-lightning/pull/9988))
* Added `DDPSpawnPlugin.spawn()` for spawning new processes of a given function ([10018](https://github.com/PyTorchLightning/pytorch-lightning/pull/10018), [#10022](https://github.com/PyTorchLightning/pytorch-lightning/pull/10022))
* Added `TrainingTypePlugin.{_setup_model, _setup_optimizer}` methods ([9994](https://github.com/PyTorchLightning/pytorch-lightning/pull/9994), [#10064](https://github.com/PyTorchLightning/pytorch-lightning/pull/10064))
* Implemented `DataParallelPlugin._setup_model` ([10010](https://github.com/PyTorchLightning/pytorch-lightning/pull/10010))
* Implemented `DeepSpeedPlugin._setup_model_and_optimizers` ([10009](https://github.com/PyTorchLightning/pytorch-lightning/pull/10009), [#10064](https://github.com/PyTorchLightning/pytorch-lightning/pull/10064))
* Implemented `{DDPShardedPlugin,DDPShardedSpawnPlugin}._setup_model_and_optimizers` ([10028](https://github.com/PyTorchLightning/pytorch-lightning/pull/10028), [#10064](https://github.com/PyTorchLightning/pytorch-lightning/pull/10064))
* Added optional `model` argument to the `optimizer_step` methods in accelerators and plugins ([10023](https://github.com/PyTorchLightning/pytorch-lightning/pull/10023))
* Updated precision attributes in `DeepSpeedPlugin` ([10164](https://github.com/PyTorchLightning/pytorch-lightning/pull/10164))
* Added the ability to return a result from rank 0 in `DDPSpawnPlugin.spawn` ([10162](https://github.com/PyTorchLightning/pytorch-lightning/pull/10162))
* Added `pytorch_lightning.lite` package ([10175](https://github.com/PyTorchLightning/pytorch-lightning/pull/10175))
* Added `LightningLite` documentation ([10043](https://github.com/PyTorchLightning/pytorch-lightning/pull/10043))
* Added `LightningLite` examples ([9987](https://github.com/PyTorchLightning/pytorch-lightning/pull/9987))
* Make the `_LiteDataLoader` an iterator and add supports for custom dataloader ([10279](https://github.com/PyTorchLightning/pytorch-lightning/pull/10279))
- Added `use_omegaconf` argument to `save_hparams_to_yaml` plugin ([9170](https://github.com/PyTorchLightning/pytorch-lightning/pull/9170))
- Added `ckpt_path` argument for `Trainer.fit()` ([10061](https://github.com/PyTorchLightning/pytorch-lightning/pull/10061))
- Added `auto_device_count` method to `Accelerators` ([10222](https://github.com/PyTorchLightning/pytorch-lightning/pull/10222))
- Added support for `devices="auto"` ([10264](https://github.com/PyTorchLightning/pytorch-lightning/pull/10264))
- Added a `filename` argument in `ModelCheckpoint.format_checkpoint_name` ([9818](https://github.com/PyTorchLightning/pytorch-lightning/pull/9818))
- Added support for empty `gpus` list to run on CPU ([10246](https://github.com/PyTorchLightning/pytorch-lightning/pull/10246))
- Added a warning if multiple batch sizes are found from ambiguous batch ([10247](https://github.com/PyTorchLightning/pytorch-lightning/pull/10247))
Changed
- Trainer now raises a `MisconfigurationException` when its methods are called with `ckpt_path="best"` but a checkpoint callback isn't configured ([9841](https://github.com/PyTorchLightning/pytorch-lightning/pull/9841))
- Setting `Trainer(accelerator="ddp_cpu")` now does not spawn a subprocess if `num_processes` is kept `1` along with `num_nodes > 1` ([9603](https://github.com/PyTorchLightning/pytorch-lightning/pull/9603))
- Module imports are now catching `ModuleNotFoundError` instead of `ImportError` ([9867](https://github.com/PyTorchLightning/pytorch-lightning/pull/9867))
- `pytorch_lightning.loggers.neptune.NeptuneLogger` is now consistent with the new [neptune-client](https://github.com/neptune-ai/neptune-client) API; the old [neptune-client](https://github.com/neptune-ai/neptune-client) API is supported by `NeptuneClient` from the [neptune-contrib](https://github.com/neptune-ai/neptune-contrib) repo ([#6867](https://github.com/PyTorchLightning/pytorch-lightning/pull/6867))
- Parsing of `enums` type hyperparameters to be saved in the `haprams.yaml` file by TensorBoard and CSV loggers has been fixed and made in line with how OmegaConf parses it ([9170](https://github.com/PyTorchLightning/pytorch-lightning/pull/9170))
- Parsing of the `gpus` Trainer argument has changed: `gpus="n"` (str) no longer selects the GPU index n and instead selects the first n devices ([8770](https://github.com/PyTorchLightning/pytorch-lightning/pull/8770))
- `iteration_count` and other index attributes in the loops has been replaced with progress dataclasses ([8477](https://github.com/PyTorchLightning/pytorch-lightning/pull/8477))
- The `trainer.lightning_module` reference is now properly set at the very beginning of a run ([8536](https://github.com/PyTorchLightning/pytorch-lightning/pull/8536))
- The model weights now get loaded in all cases when the checkpoint path gets provided in validate/test/predict, regardless of whether the model instance is provided or not ([8352](https://github.com/PyTorchLightning/pytorch-lightning/pull/8352))
- The `Trainer` functions `reset_{train,val,test,predict}_dataloader`, `reset_train_val_dataloaders`, and `request_dataloader` `model` argument is now optional ([8536](https://github.com/PyTorchLightning/pytorch-lightning/pull/8536))
- Saved checkpoints will no longer use the type of a `Callback` as the key to avoid issues with unpickling ([6886](https://github.com/PyTorchLightning/pytorch-lightning/pull/6886))
- Improved string conversion for `ResultCollection` ([8622](https://github.com/PyTorchLightning/pytorch-lightning/pull/8622))
- `LightningCLI` changes:
* `LightningCLI.init_parser` now returns the parser instance ([8721](https://github.com/PyTorchLightning/pytorch-lightning/pull/8721))
* `LightningCLI.add_core_arguments_to_parser`, `LightningCLI.parse_arguments` now take a `parser` argument ([8721](https://github.com/PyTorchLightning/pytorch-lightning/pull/8721))
* `LightningCLI.instantiate_trainer` now takes a config and a list of callbacks ([8721](https://github.com/PyTorchLightning/pytorch-lightning/pull/8721))
* Split `LightningCLI.add_core_arguments_to_parser` into `LightningCLI.add_default_arguments_to_parser` + `LightningCLI.add_core_arguments_to_parser` ([8721](https://github.com/PyTorchLightning/pytorch-lightning/pull/8721))
- The accelerator and training type plugin `setup` hooks no longer have a `model` argument ([8536](https://github.com/PyTorchLightning/pytorch-lightning/pull/8536))
- The accelerator and training type plugin `update_global_step` hook has been removed ([8856](https://github.com/PyTorchLightning/pytorch-lightning/pull/8856))
- The coverage of `self.log`-ing in any `LightningModule` or `Callback` hook has been improved ([8498](https://github.com/PyTorchLightning/pytorch-lightning/pull/8498))
- `self.log`-ing without a `Trainer` reference now raises a warning instead of an exception ([9733](https://github.com/PyTorchLightning/pytorch-lightning/pull/9733))
- Removed restrictions in the Trainer that loggers can only log from rank 0; the existing logger behavior has not changed ([8608](https://github.com/PyTorchLightning/pytorch-lightning/pull/8608))
- `Trainer.request_dataloader` now takes a `RunningStage` enum instance ([8858](https://github.com/PyTorchLightning/pytorch-lightning/pull/8858))
- Changed `rank_zero_warn` to `NotImplementedError` in the `{train, val, test, predict}_dataloader` hooks that `Lightning(Data)Module` uses ([9161](https://github.com/PyTorchLightning/pytorch-lightning/pull/9161))
- Moved `block_ddp_sync_behaviour` out of `TrainingBatchLoop` to loop utilities ([9192](https://github.com/PyTorchLightning/pytorch-lightning/pull/9192))
- Executing the `optimizer_closure` is now required when overriding the `optimizer_step` hook ([9360](https://github.com/PyTorchLightning/pytorch-lightning/pull/9360))
- Changed logging of `LightningModule` and `LightningDataModule` hyperparameters to raise an exception only if there are colliding keys with different values ([9496](https://github.com/PyTorchLightning/pytorch-lightning/pull/9496))
- `seed_everything` now fails when an invalid seed value is passed instead of selecting a random seed ([8787](https://github.com/PyTorchLightning/pytorch-lightning/pull/8787))
- The Trainer now calls `TrainingTypePlugin` collective APIs directly instead of going through the Accelerator reference ([9677](https://github.com/PyTorchLightning/pytorch-lightning/pull/9677), [#9901](https://github.com/PyTorchLightning/pytorch-lightning/pull/9901))
- The tuner now usees a unique filename to save a temporary checkpoint ([9682](https://github.com/PyTorchLightning/pytorch-lightning/pull/9682))
- Changed `HorovodPlugin.all_gather` to return a `torch.Tensor` instead of a list ([9696](https://github.com/PyTorchLightning/pytorch-lightning/pull/9696))
- Changed Trainer connectors to be protected attributes:
* Configuration Validator ([9779](https://github.com/PyTorchLightning/pytorch-lightning/pull/9779))
- The `current_epoch` and `global_step` attributes now get restored irrespective of the Trainer task ([9413](https://github.com/PyTorchLightning/pytorch-lightning/pull/9413))
- Trainer now raises an exception when requesting `amp_level` with native `amp_backend` ([9755](https://github.com/PyTorchLightning/pytorch-lightning/pull/9755))
- Update the logic to check for accumulation steps with deepspeed ([9826](https://github.com/PyTorchLightning/pytorch-lightning/pull/9826))
- `pytorch_lightning.utilities.grads.grad_norm` now raises an exception if parameter `norm_type <= 0` ([9765](https://github.com/PyTorchLightning/pytorch-lightning/pull/9765))
- Updated error message for interactive incompatible plugins ([9896](https://github.com/PyTorchLightning/pytorch-lightning/pull/9896))
- Moved the `optimizer_step` and `clip_gradients` hook from the `Accelerator` and `TrainingTypePlugin` into the `PrecisionPlugin` ([10143](https://github.com/PyTorchLightning/pytorch-lightning/pull/10143), [#10029](https://github.com/PyTorchLightning/pytorch-lightning/pull/10029))
- `NativeMixedPrecisionPlugin` and its subclasses now take an optional `GradScaler` instance ([10055](https://github.com/PyTorchLightning/pytorch-lightning/pull/10055))
- Trainer is now raising a `MisconfigurationException` instead of a warning if `Trainer.{validate/test}` is missing required methods ([10016](https://github.com/PyTorchLightning/pytorch-lightning/pull/10016))
- Changed default value of the `max_steps` Trainer argument from `None` to -1 ([9460](https://github.com/PyTorchLightning/pytorch-lightning/pull/9460))
- LightningModule now raises an error when calling `log(on_step=False, on_epoch=False)` ([10227](https://github.com/PyTorchLightning/pytorch-lightning/pull/10227))
- Quantization aware training observers are now disabled by default during validating/testing/predicting stages ([8540](https://github.com/PyTorchLightning/pytorch-lightning/pull/8540))
- Raised `MisconfigurationException` when total length of `dataloader` across ranks is zero, and give warning when total length is non-zero, but only local rank length is zero. ([9827](https://github.com/PyTorchLightning/pytorch-lightning/pull/9827))
- Changed the model size calculation using `ByteCounter` ([10123](https://github.com/PyTorchLightning/pytorch-lightning/pull/10123))
- Enabled `on_load_checkpoint` for `LightningDataModule` for all `trainer_fn` ([10238](https://github.com/PyTorchLightning/pytorch-lightning/pull/10238))
- Allowed separate config files for parameters with class type when LightningCLI is in `subclass_mode=False` ([10286](https://github.com/PyTorchLightning/pytorch-lightning/pull/10286))
Deprecated
- Deprecated Trainer argument `terminate_on_nan` in favor of `detect_anomaly`([9175](https://github.com/PyTorchLightning/pytorch-lightning/pull/9175))
- Deprecated `Trainer.terminate_on_nan` public attribute access ([9849](https://github.com/PyTorchLightning/pytorch-lightning/pull/9849))
- Deprecated `LightningModule.summarize()` in favor of `pytorch_lightning.utilities.model_summary.summarize()` ([8513](https://github.com/PyTorchLightning/pytorch-lightning/pull/8513))
- Deprecated `LightningModule.model_size` ([8343](https://github.com/PyTorchLightning/pytorch-lightning/pull/8343))
- Deprecated `DataModule` properties: `train_transforms`, `val_transforms`, `test_transforms`, `size`, `dims` ([8851](https://github.com/PyTorchLightning/pytorch-lightning/pull/8851))
- Deprecated `add_to_queue`, `get_from_queue` from `LightningModule` in favor of corresponding methods in the `DDPSpawnPlugin` ([9118](https://github.com/PyTorchLightning/pytorch-lightning/pull/9118))
- Deprecated `LightningModule.get_progress_bar_dict` and `Trainer.progress_bar_dict` in favor of `pytorch_lightning.callbacks.progress.base.get_standard_metrics` and `ProgressBarBase.get_metrics` ([8985](https://github.com/PyTorchLightning/pytorch-lightning/pull/8985))
- Deprecated `prepare_data_per_node` flag on Trainer and set it as a property of `DataHooks`, accessible in the `LightningModule` and `LightningDataModule` ([8958](https://github.com/PyTorchLightning/pytorch-lightning/pull/8958))
- Deprecated the `TestTubeLogger` ([9065](https://github.com/PyTorchLightning/pytorch-lightning/pull/9065))
- Deprecated `on_{train/val/test/predict}_dataloader()` from `LightningModule` and `LightningDataModule` ([9098](https://github.com/PyTorchLightning/pytorch-lightning/pull/9098))
- Deprecated `on_keyboard_interrupt` callback hook in favor of new `on_exception` hook ([9260](https://github.com/PyTorchLightning/pytorch-lightning/pull/9260))
- Deprecated passing `process_position` to the `Trainer` constructor in favor of adding the `ProgressBar` callback with `process_position` directly to the list of callbacks ([9222](https://github.com/PyTorchLightning/pytorch-lightning/pull/9222))
- Deprecated passing `flush_logs_every_n_steps` as a Trainer argument, instead pass it to the logger init if supported ([9366](https://github.com/PyTorchLightning/pytorch-lightning/pull/9366))
- Deprecated `LightningLoggerBase.close`, `LoggerCollection.close` in favor of `LightningLoggerBase.finalize`, `LoggerCollection.finalize` ([9422](https://github.com/PyTorchLightning/pytorch-lightning/pull/9422))
- Deprecated passing `progress_bar_refresh_rate` to the `Trainer` constructor in favor of adding the `ProgressBar` callback with `refresh_rate` directly to the list of callbacks, or passing `enable_progress_bar=False` to disable the progress bar ([9616](https://github.com/PyTorchLightning/pytorch-lightning/pull/9616))
- Deprecated `LightningDistributed` and moved the broadcast logic to `DDPPlugin` and `DDPSpawnPlugin` directly ([9691](https://github.com/PyTorchLightning/pytorch-lightning/pull/9691))
- Deprecated passing `stochastic_weight_avg` to the `Trainer` constructor in favor of adding the `StochasticWeightAveraging` callback directly to the list of callbacks ([8989](https://github.com/PyTorchLightning/pytorch-lightning/pull/8989))
- Deprecated Accelerator collective API `barrier`, `broadcast`, and `all_gather` in favor of calling the `TrainingTypePlugin` collective API directly ([9677](https://github.com/PyTorchLightning/pytorch-lightning/pull/9677))
- Deprecated `checkpoint_callback` from the `Trainer` constructor in favor of `enable_checkpointing` ([9754](https://github.com/PyTorchLightning/pytorch-lightning/pull/9754))
- Deprecated the `LightningModule.on_post_move_to_device` method ([9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
- Deprecated `pytorch_lightning.core.decorators.parameter_validation` in favor of `pytorch_lightning.utilities.parameter_tying.set_shared_parameters` ([9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
- Deprecated passing `weights_summary` to the `Trainer` constructor in favor of adding the `ModelSummary` callback with `max_depth` directly to the list of callbacks ([9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
- Deprecated `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback ([9921](https://github.com/PyTorchLightning/pytorch-lightning/pull/9921))
- Deprecated `GPUStatsMonitor` and `XLAStatsMonitor` in favor of `DeviceStatsMonitor` callback ([9924](https://github.com/PyTorchLightning/pytorch-lightning/pull/9924))
- Deprecated setting `Trainer(max_steps=None)`; To turn off the limit, set `Trainer(max_steps=-1)` (default) ([9460](https://github.com/PyTorchLightning/pytorch-lightning/pull/9460))
- Deprecated access to the `AcceleratorConnector.is_slurm_managing_tasks` attribute and marked it as protected ([10101](https://github.com/PyTorchLightning/pytorch-lightning/pull/10101))
- Deprecated access to the `AcceleratorConnector.configure_slurm_ddp` method and marked it as protected ([10101](https://github.com/PyTorchLightning/pytorch-lightning/pull/10101))
- Deprecated passing `resume_from_checkpoint` to the `Trainer` constructor in favor of `trainer.fit(ckpt_path=)` ([10061](https://github.com/PyTorchLightning/pytorch-lightning/pull/10061))
- Deprecated `ClusterEnvironment.creates_children()` in favor of `ClusterEnvironment.creates_processes_externally` (property) ([10106](https://github.com/PyTorchLightning/pytorch-lightning/pull/10106))
- Deprecated `PrecisionPlugin.master_params()` in favor of `PrecisionPlugin.main_params()` ([10105](https://github.com/PyTorchLightning/pytorch-lightning/pull/10105))
- Deprecated `lr_sch_names` from `LearningRateMonitor` ([10066](https://github.com/PyTorchLightning/pytorch-lightning/pull/10066))
- Deprecated `ProgressBar` callback in favor of `TQDMProgressBar` ([10134](https://github.com/PyTorchLightning/pytorch-lightning/pull/10134))
Removed
- Removed deprecated `metrics` ([8586](https://github.com/PyTorchLightning/pytorch-lightning/pull/8586/))
- Removed the deprecated `outputs` argument in both the `LightningModule.on_train_epoch_end` and `Callback.on_train_epoch_end` hooks ([8587](https://github.com/PyTorchLightning/pytorch-lightning/pull/8587))
- Removed the deprecated `TrainerLoggingMixin` class ([8609](https://github.com/PyTorchLightning/pytorch-lightning/pull/8609))
- Removed the deprecated `TrainerTrainingTricksMixin` class ([8679](https://github.com/PyTorchLightning/pytorch-lightning/pull/8679))
- Removed the deprecated `optimizer_idx` from `training_step` as an accepted argument in manual optimization ([8576](https://github.com/PyTorchLightning/pytorch-lightning/pull/8576))
- Removed support for the deprecated `on_save_checkpoint` signature. The hook now takes a `checkpoint` positional parameter ([8697](https://github.com/PyTorchLightning/pytorch-lightning/pull/8697))
- Removed support for the deprecated `on_load_checkpoint` signature. The hook now takes a `pl_module` positional parameter ([8697](https://github.com/PyTorchLightning/pytorch-lightning/pull/8697))
- Removed the deprecated `save_function` property in `ModelCheckpoint` ([8680](https://github.com/PyTorchLightning/pytorch-lightning/pull/8680))
- Removed the deprecated `model` argument from `ModelCheckpoint.save_checkpoint` ([8688](https://github.com/PyTorchLightning/pytorch-lightning/pull/8688))
- Removed the deprecated `sync_step` argument from `WandbLogger` ([8763](https://github.com/PyTorchLightning/pytorch-lightning/pull/8763))
- Removed the deprecated `Trainer.truncated_bptt_steps` in favor of `LightningModule.truncated_bptt_steps` ([8826](https://github.com/PyTorchLightning/pytorch-lightning/pull/8826))
- Removed `LightningModule.write_predictions` and `LightningModule.write_predictions_dict` ([8850](https://github.com/PyTorchLightning/pytorch-lightning/pull/8850))
- Removed `on_reset_*_dataloader` hooks in TrainingType Plugins and Accelerators ([8858](https://github.com/PyTorchLightning/pytorch-lightning/pull/8858))
- Removed deprecated `GradInformation` module in favor of `pytorch_lightning.utilities.grads` ([8831](https://github.com/PyTorchLightning/pytorch-lightning/pull/8831/))
- Removed `TrainingTypePlugin.on_save` and `Accelerator.on_save` ([9023](https://github.com/PyTorchLightning/pytorch-lightning/pull/9023))
- Removed `{Accelerator,TrainingTypePlugin,PrecisionPlugin}.post_optimizer_step` ([9746](https://github.com/PyTorchLightning/pytorch-lightning/pull/9746))
- Removed deprecated `connect_precision_plugin` and `connect_training_type_plugin` from `Accelerator` ([9019](https://github.com/PyTorchLightning/pytorch-lightning/pull/9019))
- Removed `on_train_epoch_end` from `Accelerator` ([9035](https://github.com/PyTorchLightning/pytorch-lightning/pull/9035))
- Removed `InterBatchProcessor` in favor of `DataLoaderIterDataFetcher` ([9052](https://github.com/PyTorchLightning/pytorch-lightning/pull/9052))
- Removed `Plugin` in `base_plugin.py` in favor of accessing `TrainingTypePlugin` and `PrecisionPlugin` directly instead ([9066](https://github.com/PyTorchLightning/pytorch-lightning/pull/9066))
- Removed `teardown` from `ParallelPlugin` ([8943](https://github.com/PyTorchLightning/pytorch-lightning/pull/8943))
- Removed deprecated `profiled_functions` argument from `PyTorchProfiler` ([9178](https://github.com/PyTorchLightning/pytorch-lightning/pull/9178))
- Removed deprecated `pytorch_lighting.utilities.argparse_utils` module ([9166](https://github.com/PyTorchLightning/pytorch-lightning/pull/9166))
- Removed deprecated property `Trainer.running_sanity_check` in favor of `Trainer.sanity_checking` ([9209](https://github.com/PyTorchLightning/pytorch-lightning/pull/9209))
- Removed deprecated `BaseProfiler.output_filename` arg from it and its descendants in favor of `dirpath` and `filename` ([9214](https://github.com/PyTorchLightning/pytorch-lightning/pull/9214))
- Removed deprecated property `ModelCheckpoint.period` in favor of `ModelCheckpoint.every_n_epochs` ([9213](https://github.com/PyTorchLightning/pytorch-lightning/pull/9213))
- Removed deprecated `auto_move_data` decorator ([9231](https://github.com/PyTorchLightning/pytorch-lightning/pull/9231))
- Removed deprecated property `LightningModule.datamodule` in favor of `Trainer.datamodule` ([9233](https://github.com/PyTorchLightning/pytorch-lightning/pull/9233))
- Removed deprecated properties `DeepSpeedPlugin.cpu_offload*` in favor of `offload_optimizer`, `offload_parameters` and `pin_memory` ([9244](https://github.com/PyTorchLightning/pytorch-lightning/pull/9244))
- Removed deprecated property `AcceleratorConnector.is_using_torchelastic` in favor of `TorchElasticEnvironment.is_using_torchelastic()` ([9729](https://github.com/PyTorchLightning/pytorch-lightning/pull/9729))
- Removed `pytorch_lightning.utilities.debugging.InternalDebugger` ([9680](https://github.com/PyTorchLightning/pytorch-lightning/pull/9680))
- Removed `call_configure_sharded_model_hook` property from `Accelerator` and `TrainingTypePlugin` ([9612](https://github.com/PyTorchLightning/pytorch-lightning/pull/9612))
- Removed `TrainerProperties` mixin and moved property definitions directly into `Trainer` ([9495](https://github.com/PyTorchLightning/pytorch-lightning/pull/9495))
- Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback ([9875](https://github.com/PyTorchLightning/pytorch-lightning/pull/9875))
- Remove `epoch` from `trainer.logged_metrics` ([9904](https://github.com/PyTorchLightning/pytorch-lightning/pull/9904))
- Removed `should_rank_save_checkpoint` property from Trainer ([9433](https://github.com/PyTorchLightning/pytorch-lightning/pull/9433))
- Remove deprecated `distributed_backend` from `Trainer` ([10017](https://github.com/PyTorchLightning/pytorch-lightning/pull/10017))
- Removed `process_idx` from the `{DDPSpawnPlugin,TPUSpawnPlugin}.new_process` methods ([10022](https://github.com/PyTorchLightning/pytorch-lightning/pull/10022))
- Removed automatic patching of `{train,val,test,predict}_dataloader()` on the `LightningModule` ([9764](https://github.com/PyTorchLightning/pytorch-lightning/pull/9764))
- Removed `pytorch_lightning.trainer.connectors.OptimizerConnector` ([10120](https://github.com/PyTorchLightning/pytorch-lightning/pull/10120))
Fixed
- Fixed ImageNet evaluation in example ([10179](https://github.com/PyTorchLightning/pytorch-lightning/pull/10179))
- Fixed an issue with logger outputs not being finalized correctly after prediction runs ([8685](https://github.com/PyTorchLightning/pytorch-lightning/pull/8685))
- Fixed `move_metrics_to_cpu` moving the loss to CPU while training on device ([9308](https://github.com/PyTorchLightning/pytorch-lightning/pull/9308))
- Fixed incorrect main progress bar indicator when resuming training mid-epoch ([9310](https://github.com/PyTorchLightning/pytorch-lightning/pull/9310))
- Fixed an issue with freeing memory of datafetchers during teardown ([9387](https://github.com/PyTorchLightning/pytorch-lightning/pull/9387))
- Fixed a bug where the training step output needed to be `deepcopy`-ed ([9349](https://github.com/PyTorchLightning/pytorch-lightning/pull/9349))
- Fixed an issue with freeing memory allocated by the data iterators in `Loop.on_run_end` ([9386](https://github.com/PyTorchLightning/pytorch-lightning/pull/9386), [#9915](https://github.com/PyTorchLightning/pytorch-lightning/pull/9915))
- Fixed `BasePredictionWriter` not returning the batch indices in a non-distributed setting ([9432](https://github.com/PyTorchLightning/pytorch-lightning/pull/9432))
- Fixed an error when running in XLA environments with no TPU attached ([9572](https://github.com/PyTorchLightning/pytorch-lightning/pull/9572))
- Fixed check on torchmetrics logged whose `compute()` output is a multielement tensor ([9582](https://github.com/PyTorchLightning/pytorch-lightning/pull/9582))
- Fixed gradient accumulation for `DDPShardedPlugin` ([9122](https://github.com/PyTorchLightning/pytorch-lightning/pull/9122))
- Fixed missing DeepSpeed distributed call ([9540](https://github.com/PyTorchLightning/pytorch-lightning/pull/9540))
- Fixed an issue with wrapped LightningModule during evaluation; The LightningModule no longer gets wrapped with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin` ([9096](https://github.com/PyTorchLightning/pytorch-lightning/pull/9096))
- Fixed `trainer.accumulate_grad_batches` to be an int on init. The default value for it is now `None` inside Trainer ([9652](https://github.com/PyTorchLightning/pytorch-lightning/pull/9652))
- Fixed `broadcast` in `DDPPlugin` and `DDPSpawnPlugin` to respect the `src` input ([9691](https://github.com/PyTorchLightning/pytorch-lightning/pull/9691))
- Fixed `self.log(on_epoch=True, reduce_fx=sum))` for the `on_batch_start` and `on_train_batch_start` hooks ([9791](https://github.com/PyTorchLightning/pytorch-lightning/pull/9791))
- Fixed `self.log(on_epoch=True)` for the `on_batch_start` and `on_train_batch_start` hooks ([9780](https://github.com/PyTorchLightning/pytorch-lightning/pull/9780))
- Fixed restoring training state during `Trainer.fit` only ([9413](https://github.com/PyTorchLightning/pytorch-lightning/pull/9413))
- Fixed DeepSpeed and Lightning both calling the scheduler ([9788](https://github.com/PyTorchLightning/pytorch-lightning/pull/9788))
- Fixed missing arguments when saving hyperparameters from the parent class but not from the child class ([9800](https://github.com/PyTorchLightning/pytorch-lightning/pull/9800))
- Fixed DeepSpeed GPU device IDs ([9847](https://github.com/PyTorchLightning/pytorch-lightning/pull/9847))
- Reset `val_dataloader` in `tuner/batch_size_scaling` ([9857](https://github.com/PyTorchLightning/pytorch-lightning/pull/9857))
- Fixed use of `LightningCLI` in computer_vision_fine_tuning.py example ([9934](https://github.com/PyTorchLightning/pytorch-lightning/pull/9934))
- Fixed issue with non-init dataclass fields in `apply_to_collection` ([9963](https://github.com/PyTorchLightning/pytorch-lightning/issues/9963))
- Reset `val_dataloader` in `tuner/batch_size_scaling` for binsearch ([9975](https://github.com/PyTorchLightning/pytorch-lightning/pull/9975))
- Fixed logic to check for spawn in dataloader `TrainerDataLoadingMixin._worker_check` ([9902](https://github.com/PyTorchLightning/pytorch-lightning/pull/9902))
- Fixed `train_dataloader` getting loaded twice when resuming from a checkpoint during `Trainer.fit()` ([9671](https://github.com/PyTorchLightning/pytorch-lightning/pull/9671))
- Fixed `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler ([10044](https://github.com/PyTorchLightning/pytorch-lightning/pull/10044))
- Fixed undesired side effects being caused by `Trainer` patching dataloader methods on the `LightningModule` ([9764](https://github.com/PyTorchLightning/pytorch-lightning/pull/9764))
- Fixed gradients not being unscaled when clipping or logging the gradient norm ([9287](https://github.com/PyTorchLightning/pytorch-lightning/pull/9287))
- Fixed `on_before_optimizer_step` getting called before the optimizer closure (including backward) has run ([10167](https://github.com/PyTorchLightning/pytorch-lightning/pull/10167))
- Fixed monitor value in `ModelCheckpoint` getting moved to the wrong device in a special case where it becomes NaN ([10118](https://github.com/PyTorchLightning/pytorch-lightning/pull/10118))
- Fixed creation of `dirpath` in `BaseProfiler` if it doesn't exist ([10073](https://github.com/PyTorchLightning/pytorch-lightning/pull/10073))
- Fixed incorrect handling of sigterm ([10189](https://github.com/PyTorchLightning/pytorch-lightning/pull/10189))
- Fixed bug where `log(on_step=True, on_epoch=True, sync_dist=True)` wouldn't reduce the value on step ([10227](https://github.com/PyTorchLightning/pytorch-lightning/pull/10227))
- Fixed an issue with `pl.utilities.seed.reset_seed` converting the `PL_SEED_WORKERS` environment variable to `bool` ([10099](https://github.com/PyTorchLightning/pytorch-lightning/pull/10099))
- Fixed iterating over a logger collection when `fast_dev_run > 0` ([10232](https://github.com/PyTorchLightning/pytorch-lightning/pull/10232))
- Fixed `batch_size` in `ResultCollection` not being reset to 1 on epoch end ([10242](https://github.com/PyTorchLightning/pytorch-lightning/pull/10242))
- Fixed `distrib_type` not being set when training plugin instances are being passed to the Trainer ([10251](https://github.com/PyTorchLightning/pytorch-lightning/pull/10251))
Contributors
adamjstewart akihironitta alessiobonfiglio ananthsub aphedges awaelchli bamblebam Benjamin-Etheredge borchero Borda borisdayma bryant1410 carmocca cowwoc daniellepintz danielykim edward-io eladsegal EricWiener ethanwharris four4fish gau-nernst hankyul2 HansolEom himanshu-dutta I-iBot jjenniferdai jstjohn justusschock kainoj kaushikb11 kingyiusuen Knarik1 low5545 lsqshr mauvilsa michele-arrival nasnoisaac ninginthecloud popfido pre-commit-ci PuneetDabral qmpzzpmq rohitgr7 ronif roshikouhai s-rog samlurye SeanNaren shnela sidml stancld stfwn tangbinh tchaton thepurpleowl Tshimanga twsl victorjoos VirajBagal wayi1 weiji14 yifuwang yopknopixx
_If we forgot someone, let us know :]_