- Fixed `ShardedTensor` state dict hook registration to check if torch distributed is available ([10621](
- Fixed an issue with `self.log` not respecting a tensor's `dtype` when applying computations ([10076](
- Fixed LigtningLite `_wrap_init` popping unexisting keys from DataLoader signature parameters ([10613](
- Fixed signals being registered within threads ([10610](
- Fixed an issue that caused Lightning to extract the batch size even though it was set by the user in `LightningModule.log` ([10408](
- Fixed `Trainer(move_metrics_to_cpu=True)` not moving the evaluation logged results to CPU ([10631](
- Fixed the `{validation,test}_step` outputs getting moved to CPU with `Trainer(move_metrics_to_cpu=True)` ([10631](
- Fixed signals being registered within threads ([10610](
- Fixed an issue with collecting logged test results with multiple dataloaders ([10522](


ananthsub awaelchli carmocca jiwidi kaushikb11 qqueing rohitgr7 shabie tchaton

- Fixed `CombinedLoader` and `max_size_cycle` didn't receive a `DistributedSampler` ([10374](
- Fixed an issue where class or init-only variables of dataclasses were passed to the dataclass constructor in `utilities.apply_to_collection` ([9702](
- Fixed `isinstance` not working with `init_meta_context`, materialized model not being moved to the device ([10493](
- Fixed an issue that prevented the Trainer to shutdown workers when execution is interrupted due to failure([10463](
- Squeeze the early stopping monitor to remove empty tensor dimensions ([10461](
- Fixed sampler replacement logic with `overfit_batches` to only replace the sample when `SequentialSampler` is not used ([10486](
- Fixed scripting causing false positive deprecation warnings ([10470](, [#10555](
- Do not fail if batch size could not be inferred for logging when using DeepSpeed ([10438](
- Fixed propagation of device and dtype information to submodules of LightningLite when they inherit from `DeviceDtypeModuleMixin` ([10559](


a-gardner1 awaelchli carmocca justusschock Raahul-Singh rohitgr7 SeanNaren tchaton

- Fixed `apply_to_collection(defaultdict)` ([10316](
- Fixed failure when `DataLoader(batch_size=None)` is passed ([10345](
- Fixed interception of `__init__` arguments for sub-classed DataLoader re-instantiation in Lite ([10334](
- Fixed issue with pickling `CSVLogger` after a call to `` ([10388](
- Fixed an import error being caused by `PostLocalSGD` when `torch.distributed` not available ([10359](
- Fixed the logging with `on_step=True` in epoch-level hooks causing unintended side-effects. Logging with `on_step=True` in epoch-level hooks will now correctly raise an error ([10409](
- Fixed deadlocks for distributed training with `RichProgressBar` ([10428](
- Fixed an issue where the model wrapper in Lite converted non-floating point tensors to float ([10429](
- Fixed an issue with inferring the dataset type in fault-tolerant training ([10432](
- Fixed dataloader workers with `persistent_workers` being deleted on every iteration ([10434](


EspenHa four4fish peterdudfield rohitgr7 tchaton kaushikb11 awaelchli Borda carmocca

Fault-tolerant Training

[Fault-tolerant Training]( is a new internal mechanism that enables PyTorch Lightning to recover from a hardware or software failure. This is particularly interesting while training in the cloud with preemptive instances which can shutdown at any time. Once a Lightning experiment unexpectedly exits, a temporary checkpoint is saved that contains the exact state of all loops and the model. With this new experimental feature, you will be able to restore your training mid-epoch on the exact batch and continue training as if it never got interrupted.



LightningLite enables pure PyTorch users to scale their existing code to any kind of hardware while retaining full control over their own loops and optimization logic.

With just a few lines of code and no large refactoring, you get support for multi-device, multi-node, running on different accelerators (CPU, GPU, TPU), native automatic mixed precision (`half` and `bfloat16`), and double precision, in just a few seconds. And no special launcher required! Check out our [documentation]( to find out how you can get one step closer to boilerplate-free research!

class Lite(LightningLite):
def run(self):
Let Lite setup your dataloader(s)
train_loader = self.setup_dataloaders(

model = Net() .to() not needed
optimizer = optim.Adam(model.parameters())
Let Lite setup your model and optimizer
model, optimizer = self.setup(model, optimizer)

for epoch in range(5):
for data, target in train_loader:
output = model(data) data is already on the device
loss = F.nll_loss(output, target)
self.backward(loss) instead of loss.backward()

Lite(accelerator="gpu", devices="auto").run()

Loop Customization

The new Loop API lets advanced users swap out the default gradient descent optimization loop at the core of Lightning with a different optimization paradigm. This is part of our effort to make Lightning the simplest, most flexible framework to take any kind of deep learning research to production.

[Read our comprehensive introduction to loops](

New Rich Progress Bar

We integrated with [Rich]( and created a new and improved progress bar for Lightning.
Try it out:

pip install rich

from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import RichProgressBar

trainer = Trainer(callbacks=[RichProgressBar()])

<a name="strategy-and-devices"></a>
New Trainer Arguments: Strategy and Devices

With the new strategy and devices arguments in the Trainer, it is now easer to switch from one hardware to another.

<div align="center">

| Before | After |
| `Trainer(accelerator="ddp", gpus=2)` | `Trainer(accelerator="gpu", devices=2, strategy="ddp")` |
| `Trainer(accelerator="ddp_cpu", num_processes=2)` | `Trainer(accelerator="cpu", devices=2, strategy="ddp")` |
| `Trainer(accelerator="tpu_spawn", tpu_cores=8)` | `Trainer(accelerator="tpu", devices=8)` |


The new `devices` argument is now agnostic to all accelerators, but the previous arguments `gpus`, `tpu_cores`, `ipus` are still available and work the same as before. In addition, it is now also possible to set `devices="auto"` or `accelerator="auto"` to select the best accelerator available on the hardware.

from pytorch_lightning import Trainer

trainer = Trainer(accelerator="auto", devices="auto")

LightningCLI V2

This release adds support for running not just `` but any of the `Trainer` entry points!

python fit
python test

LightningCLI now supports registries for callbacks, optimizers, learning rate schedulers, LightningModules and LightningDataModules. This greatly improves the command line experience as only the class names and arguments are required as follows:

python \
--trainer.callbacks=EarlyStopping \
--trainer.callbacks.patience=5 \
--trainer.callbacks.LearningRateMonitor \
--trainer.callbacks.logging_interval=epoch \
--optimizer=Adam \ \
--lr_scheduler=OneCycleLR \

We've also added support for a manual mode where the CLI takes care of the instantiation but you have control over the `Trainer` calls:

cli = LightningCLI(MyModel, run=False)

[Try out LightninCLI!](

CheckpointIO Plugins

As part of our commitment to extensibility, we have abstracted the checkpointing logic into a [CheckpointIO]( plugin. This enables users to adapt Lightning to their own infrastructure.

from pytorch_lightning.plugins import CheckpointIO

class CustomCheckpointIO(CheckpointIO):

def save_checkpoint(self, checkpoint, path):
put all logic related to saving a checkpoint here

def load_checkpoint(self, path):
put all logic related to loading a checkpoint here

def remove_checkpoint(self, path):
put all logic related to deleting a checkpoint here

BFloat16 Support

PyTorch 1.10 introduces native Automatic Mixed Precision (AMP) support for `torch.bfloat16` on CPU (was already supported for TPUs), enabling higher performance compared with `torch.float16`. Switch to bfloat16 training by setting the argument:

from pytorch_lightning import Trainer

trainer = Trainer(precision="bf16")

Enable Auto Parameters Tying

It is pretty common to share parameters within a model. However, TPUs don't retain shared parameters once moved on the devices. Lightning now supports automatic detection and re-assignement to alleviate this problem from TPUs.

Infinite Training

Infinite training is now supported by setting `Trainer(max_epochs=-1)` for an unlimited number of epochs, or `Trainer(max_steps=-1)` for an endless epoch.

> Note: you will want to avoid logging with `on_epoch=True` in case of `max_steps=-1`.

DeepSpeed Stage 1

DeepSpeed is a deep learning training optimization library, providing the means to train massive billion parameter models at scale. Lightning now also supports the DeepSpeed ZeRO Stage 1 protocol that partitions your optimizer states across your GPUs to reduce memory.


from pytorch_lightning import Trainer

trainer = Trainer(gpus=4, strategy="deepspeed_stage_1", precision=16)

For even more memory savings and model sharding advice, check out stage 2 & 3 as well in our [multi-GPU docs](

Gradient Clipping Customization

By overriding the `LightningModule.configure_gradient_clipping` hook, you can customize gradient clipping to your needs:

Perform gradient clipping on gradients associated with discriminator (optimizer_idx=1) in GAN
def configure_gradient_clipping(
if optimizer_idx == 1:
Lightning will handle the gradient clipping

This means you can now implement state-of-the-art clipping algorithms with Lightning!


Added support for `torch.use_deterministic_algorithms`. Read more about how it works [here]( You can enable it by setting:

from pytorch_lightning import Trainer

trainer = Trainer(deterministic=True)

Anomaly Detection

Lightning makes it easier to debug your code, so we've added support for `torch.set_detect_anomaly`. With this, PyTorch detects numerical anomalies like NaN or inf during forward and backward. Read more about anomaly detection [here](

from pytorch_lightning import Trainer

trainer = Trainer(detect_anomaly=True)

DDP Debugging Improvements

Are you having a hard time debugging DDP on your remote machine? Now you can debug DDP locally on the CPU:

trainer = Trainer(accelerator="cpu", strategy="ddp", devices=2)

When everything works, switch back to GPU by changing only the `accelerator`. Check our documentation for more [useful debugging tricks](
Note that this will not provide any speed benefits.

ModelSummary Callback

Generates a summary of all layers in a LightningModule. This currently works with the new `RichProgressBar` callback.

from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelSummary

trainer = Trainer(callbacks=[ModelSummary(max_depth=1)])

New Hooks

An `on_exception` Callback hook has been added which allows the user to perform custom exception handling.

class MyCallback(Callback):
def on_exception(self, trainer, pl_module, exception):
whatever you want!

Experimental Features

Inter Batch Parallelism

The inter-batch parallelism feature aims at hiding the latency of host-to-device copy of input batches behind computationally intensive operations. In some use case, it can provide training speed up. This feature is experimental and subject to change, hence opt-in through an environment variable.


Training Step With DataLoader Iterator

If your `training_step` signature takes a `dataloader_iter`, Lightning would pass it directly. This can be useful for recommendation engine optimization.

Meta Module

PyTorch 1.10 introduces the meta tensors, tensors without the data. In this continuation, PyTorch Lightning provides an `init_meta_context` context manager and `materialize_module` function to handle large sharded models.

<a name="bc-changes"></a>

Backward Incompatible Changes

Here is a selection of important changes that are not backward compatible with versions < 1.5. The full list of changes and removals are listed in the changelog at the bottom.

Parsing of GPU Argument

The interpretation of the `gpus` Trainer argument when provided as a string has changed: `Trainer(gpus="n")` (string) no longer selects the GPU index n and instead selects the first n devices. In order to preserve the old behavior, you will have to change your code to `Trainer(gpus=[n])` (list of indices) or `Trainer(gpus="n,")` (string with comma separated indices).

Distributed Backend

The argument `distributed_backend` has been removed from the `Trainer` in favor of the new `accelerator` and `strategy` arguments ([10017](

trainer = Trainer(distributed_backend="ddp_spawn", gpus=2)

trainer = Trainer(strategy="ddp_spawn", accelerator="gpu", devices=2)

Trainer Argument Defaults

- The default value of the `max_steps` Trainer argument has changed from `None` to -1 ([9460]( You can no longer specify `Trainer(max_steps=None)` and if you did, you need to change the code to `Trainer(max_steps=-1)`.
- The default value of `accumulate_grad_batches` has changed from 1 to `None` ([9652](

Loading Model Weights
The model weights now get loaded in all cases when the checkpoint path is provided in `Trainer.{validate,test,predict}`, regardless of whether the model instance is provided or not.


model reference provided:
trainer.test(model, ckpt_path=None) use provided model
trainer.test(model, ckpt_path="best") load best model
trainer.test(model, ckpt_path="my_path") load path

model reference not provided
trainer.test(ckpt_path=None) load best model (NEW BEHAVIOR!)
trainer.test(ckpt_path="my_path") load path (NEW BEHAVIOR!)

Users who relied on `trainer.test(ckpt_path=None)` to load the latest model need to change their code to `trainer.test(model)` and pass the model reference directly.

Lightning CLI

All CLI commands now need to include the Trainer method to run as the first command, i.e., one of `fit`, `validate`, `test`, `predict`.

python --trainer.max_epochs=123

python fit --trainer.max_epochs=123

For questions and help regarding CLI, join our [Lightning-CLI Slack channel](

Optimizer Hooks

- Executing the `optimizer_closure` is now required when overriding the `optimizer_step` hook ([9360]( If you relied on the previous behavior, we recommend to switch to Manual Optimization alltogether.
- The `on_before_optimizer_step` hook previously ran before the entire optimization closure, including backward. This was unintended behavior and if you rely on this, move your code to the new on_before_backward` hook.

Changes in Accelerators and Plugins

Changes in Accelerators and Plugins were made without deprecation due to their experimental state. The API is expected to become stable in 1.6.

Removed attributes and methods:
- `Accelerator.{call_configure_sharded_model_hook, connect_training_type_plugin, connect_precision_plugin, on_reset_*_dataloader, on_train_epoch_end, on_save, post_optimizer_step, update_global_step}`
- `TrainingTypePlugin.{call_configure_sharded_model_hook, on_reset_*_dataloader, on_save, post_optimizer_step, update_global_step}`
- `PrecisionPlugin.{post_optimizer_step}`
- `ParallelPlugin.teardown`

Changed signatures:
- The accelerator and training type plugin `setup` hooks no longer have a `model` argument.

Other changes:
- The base `Plugin` class has been removed.
- `HorovodPlugin.all_gather` now returns a `torch.Tensor` instead of a list.
- The LightningModule no longer gets wrapped with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin`.

<a name="changelog"></a>

Full Changelog


- Added support for monitoring the learning rate without schedulers in `LearningRateMonitor` ([9786](
- Added registration of `ShardedTensor` state dict hooks in `LightningModule.__init__` if the PyTorch version supports `ShardedTensor` ([8944](
- Added error handling including calling of `on_keyboard_interrupt()` and `on_exception()` for all entrypoints (fit, validate, test, predict) ([8819](
- Added a flavor of `training_step` that takes `dataloader_iter` as an argument ([8807](
- Added a `state_key` property to the `Callback` base class ([6886](
- Added progress tracking to loops:
* Integrated `TrainingEpochLoop.total_batch_idx` ([8598](
* Added `BatchProgress` and integrated `TrainingEpochLoop.is_last_batch` ([9657](
* Avoid optional `Tracker` attributes ([9320](
* Reset `current` progress counters when restarting an epoch loop that had already finished ([9371](
* Call `reset_on_restart` in the loop's `reset` hook instead of when loading a checkpoint ([9561](
* Use `completed` over `processed` in `reset_on_restart` ([9656](
* Renamed `reset_on_epoch` to `reset_on_run` ([9658](
- Added `batch_size` and `rank_zero_only` arguments for `log_dict` to match `log` ([8628](
- Added a check for unique GPU ids ([8666](
- Added `ResultCollection` state_dict to the Loop `state_dict` and added support for distributed reload ([8641](
- Added DeepSpeed collate checkpoint utility function ([8701](
- Added a `handles_accumulate_grad_batches` property to the training type plugins ([8856](
- Added a warning to `WandbLogger` when reusing a wandb run ([8714](
- Added `log_graph` argument for `watch` method of `WandbLogger` ([8662](
- `LightningCLI` additions:
* Added `LightningCLI(run=False|True)` to choose whether to run a `Trainer` subcommand ([8751](
* Added support to call any trainer function from the `LightningCLI` via subcommands ([7508](
* Allow easy trainer re-instantiation ([7508](
* Automatically register all optimizers and learning rate schedulers ([9565](
* Allow registering custom optimizers and learning rate schedulers without subclassing the CLI ([9565](
* Support shorthand notation to instantiate optimizers and learning rate schedulers ([9565](
* Support passing lists of callbacks via command line ([8815](
* Support shorthand notation to instantiate models ([9588](
* Support shorthand notation to instantiate datamodules ([10011](
* Added `multifile` option to `LightningCLI` to enable/disable config saving to preserve multiple files structure ([9073](
- Fault-tolerant training:
* Added `FastForwardSampler` and `CaptureIterableDataset` injection to data loading utilities ([8366](
* Added `DataFetcher` to control fetching flow ([8890](
* Added `SharedCycleIteratorState` to prevent infinite loop ([8889](
* Added `CaptureMapDataset` for state management in map-style datasets ([8891](
* Added Fault Tolerant Training to `DataFetcher` ([8891](
* Replaced old prefetch iterator with new `DataFetcher` in training loop ([8953](
* Added partial support for global random state fault-tolerance in map-style datasets ([8950](
* Converted state to tuple explicitly when setting Python random state ([9401](
* Added support for restarting an optimizer loop (multiple optimizers) ([9537](
* Added support for restarting within Evaluation Loop ([9563](
* Added mechanism to detect that a signal has been sent so the Trainer can gracefully exit ([9566](
* Added support for skipping ahead to validation during the auto-restart of fitting ([9681](
* Added support for auto-restart if a fault-tolerant checkpoint is available ([9722](
- Checkpoint saving and loading extensibility:
* Added `CheckpointIO` plugin to expose checkpoint IO from training type plugin ([8743](
* Refactored `CheckpointConnector` to offload validation logic to the `CheckpointIO` plugin ([9045](
* Added `remove_checkpoint` to `CheckpointIO` plugin by moving the responsibility out of the `ModelCheckpoint` callback ([9373](
* Added `XLACheckpointIO` plugin ([9972](
- Loop customization:
* Added `Closure` and `AbstractClosure` classes ([8642](
* Refactored `TrainingBatchLoop` and extracted `OptimizerLoop`, splitting off automatic optimization into its own loop ([9191](
* Removed `TrainingBatchLoop.backward()`; manual optimization now calls directly into `Accelerator.backward()` and automatic optimization handles backward in new `OptimizerLoop` ([9265](
* Extracted `ManualOptimization` logic from `TrainingBatchLoop` into its own separate loop class ([9266](
* Added `OutputResult` and `ManualResult` classes ([9437](, [#9424](
* Marked `OptimizerLoop.backward` as protected ([9514](
* Marked `FitLoop.should_accumulate` as protected ([9515](
* Marked several methods in `PredictionLoop` as protected: `on_predict_start`, `on_predict_epoch_end`, `on_predict_end`, `on_predict_model_eval` ([9516](
* Marked several methods in `EvaluationLoop` as protected: `get_max_batches`, `on_evaluation_model_eval`, `on_evaluation_model_train`, `on_evaluation_start`, `on_evaluation_epoch_start`, `on_evaluation_epoch_end`, `on_evaluation_end`, `reload_evaluation_dataloaders` ([9516](
* Marked several methods in `EvaluationEpochLoop` as protected: `on_evaluation_batch_start`, `evaluation_step`, `evaluation_step_end` ([9516](
* Added `yielding_training_step` example ([9983](
- Added support for saving and loading state of multiple callbacks of the same type ([7187](
- Added DeepSpeed Stage 1 support ([8974](
- Added `Python dataclass` support for `LightningDataModule` ([8272](
- Added sanitization of tensors when they get logged as hyperparameters in `TensorBoardLogger` ([9031](
- Added `InterBatchParallelDataFetcher` ([9020](
- Added `DataLoaderIterDataFetcher` ([9020](
- Added `DataFetcher` within `Fit / Evaluation` Loop ([9047](
- Added a friendly error message when DDP attempts to spawn new distributed processes with rank > 0 ([9005](
- Added Rich integration:
* Added Rich progress bar ([8929](, [#9559](
* Added Support for iterable datasets ([9734](
* Added `RichModelSummary` callback ([9546](
* Added `configure_columns` method to `RichProgressBar` ([10288](
* Added `leave` argument to `RichProgressBar` ([10301](
- Added input validation logic for precision ([9080](
- Added support for CPU AMP autocast ([9084](
- Added `on_exception` callback hook ([9183](
- Added a warning to DeepSpeed when inferring batch size ([9221](
- Added `ModelSummary` callback ([9344](
- Added `log_images`, `log_text` and `log_table` to `WandbLogger` ([9545](
- Added `PL_RECONCILE_PROCESS` environment variable to enable process reconciliation regardless of cluster environment settings ([9389](
- Added `get_device_stats` to the Accelerator interface and added its implementation for GPU and TPU ([9586](
- Added a warning when an unknown key is encountered in the optimizer configuration, and when `OneCycleLR` is used with `"interval": "epoch"` ([9666](
- Added `DeviceStatsMonitor` callback ([9712](
- Added `enable_progress_bar` to the Trainer constructor ([9664](
- Added `pl_legacy_patch` load utility for loading old checkpoints that have pickled legacy Lightning attributes ([9166](
- Added support for `torch.use_deterministic_algorithms` ([9121](
- Added automatic parameters tying for TPUs ([9525](
- Added support for `torch.autograd.set_detect_anomaly` through `Trainer` constructor argument `detect_anomaly` ([9848](
- Added `enable_model_summary` flag to Trainer ([9699](
- Added `strategy` argument to Trainer ([8597](
- Added `init_meta_context`, `materialize_module` utilities ([9920](
- Added `TPUPrecisionPlugin` ([10020](
- Added `torch.bfloat16` support:
* Added bfloat16 support for Lightning Trainer ([9049](
* Renamed `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` ([10026](
* Default to `precision=bf16` on CPU when `precision=16` is passed ([10033](
* Added support for `torch.autocast` ([10053](
- Added `kfold` example for loop customization ([9965](
- LightningLite:
* Added `PrecisionPlugin.forward_context`, making it the default implementation for all `{train,val,test,predict}_step_context()` methods ([9988](
* Added `DDPSpawnPlugin.spawn()` for spawning new processes of a given function ([10018](, [#10022](
* Added `TrainingTypePlugin.{_setup_model, _setup_optimizer}` methods ([9994](, [#10064](
* Implemented `DataParallelPlugin._setup_model` ([10010](
* Implemented `DeepSpeedPlugin._setup_model_and_optimizers` ([10009](, [#10064](
* Implemented `{DDPShardedPlugin,DDPShardedSpawnPlugin}._setup_model_and_optimizers` ([10028](, [#10064](
* Added optional `model` argument to the `optimizer_step` methods in accelerators and plugins ([10023](
* Updated precision attributes in `DeepSpeedPlugin` ([10164](
* Added the ability to return a result from rank 0 in `DDPSpawnPlugin.spawn` ([10162](
* Added `pytorch_lightning.lite` package ([10175](
* Added `LightningLite` documentation ([10043](
* Added `LightningLite` examples ([9987](
* Make the `_LiteDataLoader` an iterator and add supports for custom dataloader ([10279](
- Added `use_omegaconf` argument to `save_hparams_to_yaml` plugin ([9170](
- Added `ckpt_path` argument for `` ([10061](
- Added `auto_device_count` method to `Accelerators` ([10222](
- Added support for `devices="auto"` ([10264](
- Added a `filename` argument in `ModelCheckpoint.format_checkpoint_name` ([9818](
- Added support for empty `gpus` list to run on CPU ([10246](
- Added a warning if multiple batch sizes are found from ambiguous batch ([10247](


- Trainer now raises a `MisconfigurationException` when its methods are called with `ckpt_path="best"` but a checkpoint callback isn't configured ([9841](
- Setting `Trainer(accelerator="ddp_cpu")` now does not spawn a subprocess if `num_processes` is kept `1` along with `num_nodes > 1` ([9603](
- Module imports are now catching `ModuleNotFoundError` instead of `ImportError` ([9867](
- `pytorch_lightning.loggers.neptune.NeptuneLogger` is now consistent with the new [neptune-client]( API; the old [neptune-client]( API is supported by `NeptuneClient` from the [neptune-contrib]( repo ([#6867](
- Parsing of `enums` type hyperparameters to be saved in the `haprams.yaml` file by TensorBoard and CSV loggers has been fixed and made in line with how OmegaConf parses it ([9170](
- Parsing of the `gpus` Trainer argument has changed: `gpus="n"` (str) no longer selects the GPU index n and instead selects the first n devices ([8770](
- `iteration_count` and other index attributes in the loops has been replaced with progress dataclasses ([8477](
- The `trainer.lightning_module` reference is now properly set at the very beginning of a run ([8536](
- The model weights now get loaded in all cases when the checkpoint path gets provided in validate/test/predict, regardless of whether the model instance is provided or not ([8352](
- The `Trainer` functions `reset_{train,val,test,predict}_dataloader`, `reset_train_val_dataloaders`, and `request_dataloader` `model` argument is now optional ([8536](
- Saved checkpoints will no longer use the type of a `Callback` as the key to avoid issues with unpickling ([6886](
- Improved string conversion for `ResultCollection` ([8622](
- `LightningCLI` changes:
* `LightningCLI.init_parser` now returns the parser instance ([8721](
* `LightningCLI.add_core_arguments_to_parser`, `LightningCLI.parse_arguments` now take a `parser` argument ([8721](
* `LightningCLI.instantiate_trainer` now takes a config and a list of callbacks ([8721](
* Split `LightningCLI.add_core_arguments_to_parser` into `LightningCLI.add_default_arguments_to_parser` + `LightningCLI.add_core_arguments_to_parser` ([8721](
- The accelerator and training type plugin `setup` hooks no longer have a `model` argument ([8536](
- The accelerator and training type plugin `update_global_step` hook has been removed ([8856](
- The coverage of `self.log`-ing in any `LightningModule` or `Callback` hook has been improved ([8498](
- `self.log`-ing without a `Trainer` reference now raises a warning instead of an exception ([9733](
- Removed restrictions in the Trainer that loggers can only log from rank 0; the existing logger behavior has not changed ([8608](
- `Trainer.request_dataloader` now takes a `RunningStage` enum instance ([8858](
- Changed `rank_zero_warn` to `NotImplementedError` in the `{train, val, test, predict}_dataloader` hooks that `Lightning(Data)Module` uses ([9161](
- Moved `block_ddp_sync_behaviour` out of `TrainingBatchLoop` to loop utilities ([9192](
- Executing the `optimizer_closure` is now required when overriding the `optimizer_step` hook ([9360](
- Changed logging of `LightningModule` and `LightningDataModule` hyperparameters to raise an exception only if there are colliding keys with different values ([9496](
- `seed_everything` now fails when an invalid seed value is passed instead of selecting a random seed ([8787](
- The Trainer now calls `TrainingTypePlugin` collective APIs directly instead of going through the Accelerator reference ([9677](, [#9901](
- The tuner now usees a unique filename to save a temporary checkpoint ([9682](
- Changed `HorovodPlugin.all_gather` to return a `torch.Tensor` instead of a list ([9696](
- Changed Trainer connectors to be protected attributes:
* Configuration Validator ([9779](
- The `current_epoch` and `global_step` attributes now get restored irrespective of the Trainer task ([9413](
- Trainer now raises an exception when requesting `amp_level` with native `amp_backend` ([9755](
- Update the logic to check for accumulation steps with deepspeed ([9826](
- `pytorch_lightning.utilities.grads.grad_norm` now raises an exception if parameter `norm_type <= 0` ([9765](
- Updated error message for interactive incompatible plugins ([9896](
- Moved the `optimizer_step` and `clip_gradients` hook from the `Accelerator` and `TrainingTypePlugin` into the `PrecisionPlugin` ([10143](, [#10029](
- `NativeMixedPrecisionPlugin` and its subclasses now take an optional `GradScaler` instance ([10055](
- Trainer is now raising a `MisconfigurationException` instead of a warning if `Trainer.{validate/test}` is missing required methods ([10016](
- Changed default value of the `max_steps` Trainer argument from `None` to -1 ([9460](
- LightningModule now raises an error when calling `log(on_step=False, on_epoch=False)` ([10227](
- Quantization aware training observers are now disabled by default during validating/testing/predicting stages ([8540](
- Raised `MisconfigurationException` when total length of `dataloader` across ranks is zero, and give warning when total length is non-zero, but only local rank length is zero. ([9827](
- Changed the model size calculation using `ByteCounter` ([10123](
- Enabled `on_load_checkpoint` for `LightningDataModule` for all `trainer_fn` ([10238](
- Allowed separate config files for parameters with class type when LightningCLI is in `subclass_mode=False` ([10286](


- Deprecated Trainer argument `terminate_on_nan` in favor of `detect_anomaly`([9175](
- Deprecated `Trainer.terminate_on_nan` public attribute access ([9849](
- Deprecated `LightningModule.summarize()` in favor of `pytorch_lightning.utilities.model_summary.summarize()` ([8513](
- Deprecated `LightningModule.model_size` ([8343](
- Deprecated `DataModule` properties: `train_transforms`, `val_transforms`, `test_transforms`, `size`, `dims` ([8851](
- Deprecated `add_to_queue`, `get_from_queue` from `LightningModule` in favor of corresponding methods in the `DDPSpawnPlugin` ([9118](
- Deprecated `LightningModule.get_progress_bar_dict` and `Trainer.progress_bar_dict` in favor of `pytorch_lightning.callbacks.progress.base.get_standard_metrics` and `ProgressBarBase.get_metrics` ([8985](
- Deprecated `prepare_data_per_node` flag on Trainer and set it as a property of `DataHooks`, accessible in the `LightningModule` and `LightningDataModule` ([8958](
- Deprecated the `TestTubeLogger` ([9065](
- Deprecated `on_{train/val/test/predict}_dataloader()` from `LightningModule` and `LightningDataModule` ([9098](
- Deprecated `on_keyboard_interrupt` callback hook in favor of new `on_exception` hook ([9260](
- Deprecated passing `process_position` to the `Trainer` constructor in favor of adding the `ProgressBar` callback with `process_position` directly to the list of callbacks ([9222](
- Deprecated passing `flush_logs_every_n_steps` as a Trainer argument, instead pass it to the logger init if supported ([9366](
- Deprecated `LightningLoggerBase.close`, `LoggerCollection.close` in favor of `LightningLoggerBase.finalize`, `LoggerCollection.finalize` ([9422](
- Deprecated passing `progress_bar_refresh_rate` to the `Trainer` constructor in favor of adding the `ProgressBar` callback with `refresh_rate` directly to the list of callbacks, or passing `enable_progress_bar=False` to disable the progress bar ([9616](
- Deprecated `LightningDistributed` and moved the broadcast logic to `DDPPlugin` and `DDPSpawnPlugin` directly ([9691](
- Deprecated passing `stochastic_weight_avg` to the `Trainer` constructor in favor of adding the `StochasticWeightAveraging` callback directly to the list of callbacks ([8989](
- Deprecated Accelerator collective API `barrier`, `broadcast`, and `all_gather` in favor of calling the `TrainingTypePlugin` collective API directly ([9677](
- Deprecated `checkpoint_callback` from the `Trainer` constructor in favor of `enable_checkpointing` ([9754](
- Deprecated the `LightningModule.on_post_move_to_device` method ([9525](
- Deprecated `pytorch_lightning.core.decorators.parameter_validation` in favor of `pytorch_lightning.utilities.parameter_tying.set_shared_parameters` ([9525](
- Deprecated passing `weights_summary` to the `Trainer` constructor in favor of adding the `ModelSummary` callback with `max_depth` directly to the list of callbacks ([9699](
- Deprecated `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback ([9921](
- Deprecated `GPUStatsMonitor` and `XLAStatsMonitor` in favor of `DeviceStatsMonitor` callback ([9924](
- Deprecated setting `Trainer(max_steps=None)`; To turn off the limit, set `Trainer(max_steps=-1)` (default) ([9460](
- Deprecated access to the `AcceleratorConnector.is_slurm_managing_tasks` attribute and marked it as protected ([10101](
- Deprecated access to the `AcceleratorConnector.configure_slurm_ddp` method and marked it as protected ([10101](
- Deprecated passing `resume_from_checkpoint` to the `Trainer` constructor in favor of `` ([10061](
- Deprecated `ClusterEnvironment.creates_children()` in favor of `ClusterEnvironment.creates_processes_externally` (property) ([10106](
- Deprecated `PrecisionPlugin.master_params()` in favor of `PrecisionPlugin.main_params()` ([10105](
- Deprecated `lr_sch_names` from `LearningRateMonitor` ([10066](
- Deprecated `ProgressBar` callback in favor of `TQDMProgressBar` ([10134](


- Removed deprecated `metrics` ([8586](
- Removed the deprecated `outputs` argument in both the `LightningModule.on_train_epoch_end` and `Callback.on_train_epoch_end` hooks ([8587](
- Removed the deprecated `TrainerLoggingMixin` class ([8609](
- Removed the deprecated `TrainerTrainingTricksMixin` class ([8679](
- Removed the deprecated `optimizer_idx` from `training_step` as an accepted argument in manual optimization ([8576](
- Removed support for the deprecated `on_save_checkpoint` signature. The hook now takes a `checkpoint` positional parameter ([8697](
- Removed support for the deprecated `on_load_checkpoint` signature. The hook now takes a `pl_module` positional parameter ([8697](
- Removed the deprecated `save_function` property in `ModelCheckpoint` ([8680](
- Removed the deprecated `model` argument from `ModelCheckpoint.save_checkpoint` ([8688](
- Removed the deprecated `sync_step` argument from `WandbLogger` ([8763](
- Removed the deprecated `Trainer.truncated_bptt_steps` in favor of `LightningModule.truncated_bptt_steps` ([8826](
- Removed `LightningModule.write_predictions` and `LightningModule.write_predictions_dict` ([8850](
- Removed `on_reset_*_dataloader` hooks in TrainingType Plugins and Accelerators ([8858](
- Removed deprecated `GradInformation` module in favor of `pytorch_lightning.utilities.grads` ([8831](
- Removed `TrainingTypePlugin.on_save` and `Accelerator.on_save` ([9023](
- Removed `{Accelerator,TrainingTypePlugin,PrecisionPlugin}.post_optimizer_step` ([9746](
- Removed deprecated `connect_precision_plugin` and `connect_training_type_plugin` from `Accelerator` ([9019](
- Removed `on_train_epoch_end` from `Accelerator` ([9035](
- Removed `InterBatchProcessor` in favor of `DataLoaderIterDataFetcher` ([9052](
- Removed `Plugin` in `` in favor of accessing `TrainingTypePlugin` and `PrecisionPlugin` directly instead ([9066](
- Removed `teardown` from `ParallelPlugin` ([8943](
- Removed deprecated `profiled_functions` argument from `PyTorchProfiler` ([9178](
- Removed deprecated `pytorch_lighting.utilities.argparse_utils` module ([9166](
- Removed deprecated property `Trainer.running_sanity_check` in favor of `Trainer.sanity_checking` ([9209](
- Removed deprecated `BaseProfiler.output_filename` arg from it and its descendants in favor of `dirpath` and `filename` ([9214](
- Removed deprecated property `ModelCheckpoint.period` in favor of `ModelCheckpoint.every_n_epochs` ([9213](
- Removed deprecated `auto_move_data` decorator ([9231](
- Removed deprecated property `LightningModule.datamodule` in favor of `Trainer.datamodule` ([9233](
- Removed deprecated properties `DeepSpeedPlugin.cpu_offload*` in favor of `offload_optimizer`, `offload_parameters` and `pin_memory` ([9244](
- Removed deprecated property `AcceleratorConnector.is_using_torchelastic` in favor of `TorchElasticEnvironment.is_using_torchelastic()` ([9729](
- Removed `pytorch_lightning.utilities.debugging.InternalDebugger` ([9680](
- Removed `call_configure_sharded_model_hook` property from `Accelerator` and `TrainingTypePlugin` ([9612](
- Removed `TrainerProperties` mixin and moved property definitions directly into `Trainer` ([9495](
- Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback ([9875](
- Remove `epoch` from `trainer.logged_metrics` ([9904](
- Removed `should_rank_save_checkpoint` property from Trainer ([9433](
- Remove deprecated `distributed_backend` from `Trainer` ([10017](
- Removed `process_idx` from the `{DDPSpawnPlugin,TPUSpawnPlugin}.new_process` methods ([10022](
- Removed automatic patching of `{train,val,test,predict}_dataloader()` on the `LightningModule` ([9764](
- Removed `pytorch_lightning.trainer.connectors.OptimizerConnector` ([10120](


- Fixed ImageNet evaluation in example ([10179](
- Fixed an issue with logger outputs not being finalized correctly after prediction runs ([8685](
- Fixed `move_metrics_to_cpu` moving the loss to CPU while training on device ([9308](
- Fixed incorrect main progress bar indicator when resuming training mid-epoch ([9310](
- Fixed an issue with freeing memory of datafetchers during teardown ([9387](
- Fixed a bug where the training step output needed to be `deepcopy`-ed ([9349](
- Fixed an issue with freeing memory allocated by the data iterators in `Loop.on_run_end` ([9386](, [#9915](
- Fixed `BasePredictionWriter` not returning the batch indices in a non-distributed setting ([9432](
- Fixed an error when running in XLA environments with no TPU attached ([9572](
- Fixed check on torchmetrics logged whose `compute()` output is a multielement tensor ([9582](
- Fixed gradient accumulation for `DDPShardedPlugin` ([9122](
- Fixed missing DeepSpeed distributed call ([9540](
- Fixed an issue with wrapped LightningModule during evaluation; The LightningModule no longer gets wrapped with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin` ([9096](
- Fixed `trainer.accumulate_grad_batches` to be an int on init. The default value for it is now `None` inside Trainer ([9652](
- Fixed `broadcast` in `DDPPlugin` and `DDPSpawnPlugin` to respect the `src` input ([9691](
- Fixed `self.log(on_epoch=True, reduce_fx=sum))` for the `on_batch_start` and `on_train_batch_start` hooks ([9791](
- Fixed `self.log(on_epoch=True)` for the `on_batch_start` and `on_train_batch_start` hooks ([9780](
- Fixed restoring training state during `` only ([9413](
- Fixed DeepSpeed and Lightning both calling the scheduler ([9788](
- Fixed missing arguments when saving hyperparameters from the parent class but not from the child class ([9800](
- Fixed DeepSpeed GPU device IDs ([9847](
- Reset `val_dataloader` in `tuner/batch_size_scaling` ([9857](
- Fixed use of `LightningCLI` in example ([9934](
- Fixed issue with non-init dataclass fields in `apply_to_collection` ([9963](
- Reset `val_dataloader` in `tuner/batch_size_scaling` for binsearch ([9975](
- Fixed logic to check for spawn in dataloader `TrainerDataLoadingMixin._worker_check` ([9902](
- Fixed `train_dataloader` getting loaded twice when resuming from a checkpoint during `` ([9671](
- Fixed `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler ([10044](
- Fixed undesired side effects being caused by `Trainer` patching dataloader methods on the `LightningModule` ([9764](
- Fixed gradients not being unscaled when clipping or logging the gradient norm ([9287](
- Fixed `on_before_optimizer_step` getting called before the optimizer closure (including backward) has run ([10167](
- Fixed monitor value in `ModelCheckpoint` getting moved to the wrong device in a special case where it becomes NaN ([10118](
- Fixed creation of `dirpath` in `BaseProfiler` if it doesn't exist ([10073](
- Fixed incorrect handling of sigterm ([10189](
- Fixed bug where `log(on_step=True, on_epoch=True, sync_dist=True)` wouldn't reduce the value on step ([10227](
- Fixed an issue with `pl.utilities.seed.reset_seed` converting the `PL_SEED_WORKERS` environment variable to `bool` ([10099](
- Fixed iterating over a logger collection when `fast_dev_run > 0` ([10232](
- Fixed `batch_size` in `ResultCollection` not being reset to 1 on epoch end ([10242](
- Fixed `distrib_type` not being set when training plugin instances are being passed to the Trainer ([10251](


adamjstewart akihironitta alessiobonfiglio ananthsub aphedges awaelchli bamblebam Benjamin-Etheredge borchero Borda borisdayma bryant1410 carmocca cowwoc daniellepintz danielykim edward-io eladsegal EricWiener ethanwharris four4fish gau-nernst hankyul2 HansolEom himanshu-dutta I-iBot jjenniferdai jstjohn justusschock kainoj kaushikb11 kingyiusuen Knarik1 low5545 lsqshr mauvilsa michele-arrival nasnoisaac ninginthecloud popfido pre-commit-ci PuneetDabral qmpzzpmq rohitgr7 ronif roshikouhai s-rog samlurye SeanNaren shnela sidml stancld stfwn tangbinh tchaton thepurpleowl Tshimanga twsl victorjoos VirajBagal wayi1 weiji14 yifuwang yopknopixx

The PyTorch Lightning team and its community are excited to announce Lightning 1.5, introducing support for LightningLite, Fault-tolerant Training, Loop Customization, Lightning Tutorials, LightningCLI V2, RichProgressBar, CheckpointIO Plugin, Trainer Strategy flag, and more!

- [Highlights](highlights)
- [Backward Incompatible Changes](bc-changes)
- [Full Changelog](changelog)

<a name="highlights"></a>


- Moved the gradient unscaling in `NativeMixedPrecisionPlugin` from `pre_optimizer_step` to `post_backward` ([9606](
- Fixed gradient unscaling being called too late, causing gradient clipping and gradient norm tracking to be applied incorrectly ([9606](
- Fixed `lr_find` to generate same results on multiple calls ([9704](
- Fixed `reset` metrics on validation epoch end ([9717](
- Fixed input validation for `gradient_clip_val`, `gradient_clip_algorithm`, `track_grad_norm` and `terminate_on_nan` Trainer arguments ([9595](
- Reset metrics before each task starts ([9410](


rohitgr7 tchaton

