Lightning

Latest version: v2.5.1

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 32

2.0

<img src="https://pl-public-data.s3.amazonaws.com/assets_lightning/fabric/PyTorch-to-Fabric-Spectrum-2.svg"
alt="Fabric spans across a large spectrum - from raw PyTorch all the way to high-level PyTorch Lightning"
style="margin-top: 50px; margin-bottom: 50px" />

Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code.

- Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training
- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the box
- Handles all the boilerplate device logic for you
- Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...)
- Designed with multi-billion parameter models in mind

[:open_book: Go to Fabric documentation :open_book:](https://lightning.ai/docs/fabric/latest)

diff

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset

+ from lightning.fabric import Fabric

class PyTorchModel(nn.Module):
...

class PyTorchDataset(Dataset):
...

+ fabric = Fabric(accelerator="cuda", devices=8, strategy="ddp")
+ fabric.launch()

- device = "cuda" if torch.cuda.is_available() else "cpu"
model = PyTorchModel(...)
optimizer = torch.optim.SGD(model.parameters())
+ model, optimizer = fabric.setup(model, optimizer)
dataloader = DataLoader(PyTorchDataset(...), ...)
+ dataloader = fabric.setup_dataloaders(dataloader)
model.train()

for epoch in range(num_epochs):
for batch in dataloader:
input, target = batch
- input, target = input.to(device), target.to(device)
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
- loss.backward()
+ fabric.backward(loss)
optimizer.step()
lr_scheduler.step()

<a name="bc-changes"></a>
Backward Incompatible Changes

This section outlines notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the [Full Changelog](changelog) below.

Since 2.0 is a major release, we took the opportunity to take our APIs to the next level and make considerable changes to reduce the backwards incompatible changes in the future. To alleviate this, we will commit to continue supporting the 1.9.x line of releases by doing bug-fix releases with any important fixes that are necessary.

<a name="bc-changes-pytorch"></a>
PyTorch

The `*_epoch_end` hooks were removed ([16520](https://github.com/Lightning-AI/lightning/pull/16520))

Since the very beginning of Lightning, the LightningModule offered the convenient `*_epoch_end` end hooks in which users could reduce metrics collected across the entire epoch to log them. For this to work, Lightning had to store all the outputs returned from the `*_step` methods internally to be able to send them to the `*_epoch_end` hook.

This "silent" accumulation of memory lead to many users scratching their head when they found that after training for 20 hours the epoch crashed randomly with an out-of-memory error. This also meant that avoiding this behaviour required code changes, as simply overriding this hook would force this behaviour, regardless of whether you used the outputs. This was exacerbated by users not knowing the difference between the `on_train_epoch_end` (does not store outputs) and `training_epoch_end` (does store outputs) hooks.

Based on this feedback, we decided to remove this mechanism completely. **Lightning 2.0 favors simplicity and speed over convenience features**.

Before:

python
import lightning as L

class LitModel(L.LightningModule):

def training_step(self, batch, batch_idx):
...
return {"loss": loss, "banana": banana}

`outputs` is a list of all bananas returned in the epoch
def training_epoch_end(self, outputs):
avg_banana = torch.cat(out["banana"] for out in outputs).mean()

Now:

python
import lightning as L

class LitModel(L.LightningModule):
def __init__(self):
super().__init__()
1. Create a list to hold the outputs of `*_step`
self.bananas = []

def training_step(self, batch, batch_idx):
...
2. Add the outputs to the list
You should be aware of the implications on memory usage
self.bananas.append(banana)
return loss

3. Rename the hook to `on_*_epoch_end`
def on_train_epoch_end(self):
4. Do something with all outputs
avg_banana = torch.cat(self.bananas).mean()
Don't forget to clear the memory for the next epoch!
self.bananas.clear()

The new way makes the user to manage their own list of outputs, and with it also the responsibility of managing the memory correctly. If an OOM happens, the user can identify the issue *in their own code* rather than having to guess what the Trainer does.

The prediction epoch end hook is a [special case](https://github.com/Lightning-AI/lightning/pull/16655) of this change where the outputs argument has been removed from `on_predict_epoch_end(trainer, pl_module)` but can still be accessed via the attribute `trainer.predict_loop.outputs`

You can find a migration guide for this change in this [PR's description](https://github.com/Lightning-AI/lightning/pull/16520).

Working with multiple optimizers ([16539](https://github.com/Lightning-AI/lightning/pull/16539))

Lightning 2.0 removed the special optimizer loop that would manage multiple optimizers in the automatic mode in favor of an overall easier to understand and debuggable training loop code.
Training with multiple optimizers is now restricted to the ["manual optimization mode"](https://lightning.ai/docs/pytorch/stable/model/manual_optimization.html):

Before:

python
import lightning as L

class LitModel(L.LightningModule):

def configure_optimizers(self):
...
return optimizer1, optimizer2

def training_step(self, batch, batch_idx, optimizer_idx)
if optimizer_idx == 0:
...
return loss0
if optimizer_idx == 1:
...
return loss1

Now:

python
import lightning as L

class LitModel(L. LightningModule):

def __init__(self):
super().__init__()
1. Switch to manual optimization mode
self.automatic_optimization = False

def configure_optimizers(self):
...
2. Return multiple optimizers, same as before
return optimizer1, optimizer2

3. Remove the `optimizer_idx` argument from `training_step`
def training_step(self, batch, batch_idx)

4. Grab all optimizers you want to work with
optimizer1, optimizer2 = self.optimizers()
...

5. Perform backward manually, step optimzers, etc.
self.manual_backward(loss0)
optimizer1.step()
optimizer1.zero_grad()

6. In manual optimization, you don't need to return anything
return None

You may also find the utility methods `self.toggle_optimizer()` and `self.untoggle_optimizer()` useful if you need to restrict parameters that require gradients to a specific optimizer. For a complete example, see our [simple GAN implementation](https://github.com/Lightning-AI/lightning/blob/master/examples/pytorch/domain_templates/generative_adversarial_net.py).

Truncated backpropagation through time (TBPTT) ([16172](https://github.com/Lightning-AI/lightning/pull/16172))

Similar to the multi-optimizer loop mentioned above, truncated backpropagation through time (TBPTT) was a loop that added a lot of complexity to the Trainer. Over time, TBPTT has fallen out of fashion and today the demand from users is so low that we decided to drop special support in the framework in favor of simplifying the Trainer. TBPTT can still be done in manual optimization.

Before:
python
import lightning as L

class LitModel(L.LightningModule):

def __init__(self):
super().__init__()
self.truncated_bptt_steps = 10
self.my_rnn = ...

def training_step(self, batch, batch_idx, hiddens):
...
loss, hiddens = self.my_rnn(..., hiddens)
...
return loss, hiddens

Now:

python
import lightning as L

class LitModel(L.LightningModule):

def __init__(self):
super().__init__()

1. Switch to manual optimization
self.automatic_optimization = False

self.truncated_bptt_steps = 10
self.my_rnn = ...

2. Remove the `hiddens` argument
def training_step(self, batch, batch_idx):

3. Split the batch in chunks along the time dimension
split_batches = split_batch(batch, self.truncated_bptt_steps)

hiddens = ... 3. Choose the initial hidden state
for split_batch in range(split_batches):
4. Perform the optimization in a loop
loss, hiddens = self.my_rnn(split_batch, hiddens)
self.backward(loss)
optimizer.step()
optimizer.zero_grad()

5. "Truncate"
hiddens = hiddens.detach()

6. Remove the return of `hiddens`
Returning loss in manual optimization is not needed
return None

Working with multiple dataloaders ([16800](https://github.com/Lightning-AI/lightning/pull/16800), [#16753](https://github.com/Lightning-AI/lightning/pull/16753))

To simplify the Trainer interface and with the goal of simpler iterable support inside the Trainer, we removed the`Trainer(multiple_trainloader_mode=...)` argument. The mode is now agnostic to the trainer stage ("train" previously) and it's easier to debug and understand for the user as the logic is all encapsulated in the `CombinedLoader`

Before:

python
import lightning as L

class LitModel(L.LightningModule):
def train_dataloader(self):
return [DataLoader(), DataLoader()]

model = LitModel()
trainer = Trainer(multiple_trainloader_mode="min_size")
trainer.fit(model)

Now:

python
import lightning as L
from lightning.pytorch.utilities import CombinedLoader

class LitModel(L.LightningModule):
def train_dataloader(self):
iterables = [DataLoader(), DataLoader()]
return CombinedLoader(iterables, mode="min_size")

model = LitModel()
trainer = Trainer()
trainer.fit(model)

Related to this, we cleaned up which hooks need the `dataloader_idx` as an input argument. Now it's only required if you use multiple dataloaders. Don't worry, the Trainer will automatically check if it's required for you and tell you about it.

Accessing dataloaders ([16726](https://github.com/Lightning-AI/lightning/pull/16726), [#16800](https://github.com/Lightning-AI/lightning/pull/16800))

In the case that you require access to the `DataLoader` or `Dataset` objects, iterables for each step can be accessed
via the trainer properties `Trainer.train_dataloader`, `Trainer.val_dataloaders`, `Trainer.test_dataloaders`, and `Trainer.predict_dataloaders`.

These properties will match exactly what was returned in your `*_dataloader` hooks or passed to the `Trainer`, meaning that if you returned a dictionary of dataloaders, these will return a dictionary of dataloaders. This wasn't the case in before 2.0:

Before:

python!
Passing 1 dataloader per stage
train_dataset = trainer.train_dataloader.loaders.dataset
val_dataset = trainer.val_dataloaders[0].dataset

Passing 2 dataloaders per stage
train_dataset = trainer.train_dataloader.loaders[0].dataset
val_dataset = trainer.val_dataloaders[0].dataset

Now:

python
Passing 1 dataloader per stage
train_dataset = trainer.train_dataloader.dataset
val_dataset = trainer.val_dataloaders.dataset

Passing 2 dataloaders per stage
train_dataset = trainer.train_dataloader[0].dataset
val_dataset = trainer.val_dataloaders[0].dataset

The Tuner and Trainer broke up ([16462](https://github.com/Lightning-AI/lightning/pull/16462))

The Tuner and Trainer are no longer together. The two Trainer arguments `Trainer(auto_lr_find=..., auto_scale_batch_size=...)` and the `Trainer.tune()` method were removed to make the Trainer leaner and easier to work with.

Before:

python
import lightning as L

Tune learning rate
trainer = L.Trainer(auto_lr_find=True)
trainer.tune(model)

Tune batch size
trainer = L.Trainer(auto_scale_batch_size=True)
trainer.tune(model)

Fit using tuned settings
trainer.fit(model)

Now:

python
import lightning as L

1. Create the Trainer
trainer = L.Trainer()

2. Create the Tuner
tuner = L.pytorch.tuner.Tuner(trainer)

3. Tune learning rate
tuner.lr_find(...)

4. Tune batch size
tuner.scale_batch_size(...)

Fit using tuned settings
trainer.fit(model)

You can find more documentation about the tuner [here](https://lightning.ai/docs/pytorch/stable/advanced/training_tricks.html#batch-size-finder).

Standardized device selection and automation

In Lightning 1.6.0, we simplified the Trainer's signature by collapsing the four accelerator-specific device arguments into a single one called `Trainer(devices=...)`. In 2.0, we are now dropping support for the old `Trainer(gpus=..., tpu_cores=..., ipus=..., num_processes=...)`.

Before:

python
import lightning as L

Multiple CPU processes
trainer = L.Trainer(num_processes=2)

GPU selection
trainer = L.Trainer(gpus=4)

TPU core selection
trainer = L.Trainer(tpu_cores=[1])

Graphcore IPU devices
trainer = L.Trainer(ipus=1)

Now:

python
import lightning as L

Select devices and accelerator separately
trainer = L.Trainer(accelerator="cpu", devices=2)
trainer = L.Trainer(accelerator="gpu", devices=4)
trainer = L.Trainer(accelerator="tpu", devices=[1])
trainer = L.Trainer(accelerator="ipu", devices=1)

Or let Lightning detect the accelerator automatically
trainer = L.Trainer(accelerator="auto", devices=2)

`accelerator="auto"` is the default
trainer = L.Trainer(devices=2)

In addition, the `Trainer(auto_select_gpus=...)` also got removed ([16184](https://github.com/Lightning-AI/lightning/pull/16184)). This was a problematic feature that was not well documented, often misundertood, and lead to DDP stalls due to race conditions in the device selection. It is recommended to use the `devices="auto"` instead or, if the exact previous behavior is desired, use the utility function.

Before:

python
import lightning as L

trainer = L.Trainer(auto_select_gpus=True, devices=2)

Now:

python
import lightning as L

Recommended
trainer = L.Trainer(devices="auto")

Alternatively, use the utility function (with care!)
from lightning.pytorch.accelerators import find_usable_cuda_devices

trainer = L.Trainer(devices=find_usable_cuda_devices(2), strategy="ddp_spawn")

The slow and clunky data-parallel strategy ([16748](https://github.com/Lightning-AI/lightning/pull/16748))

PyTorch and Lightning have discouraged the use of the [old-style DataParallel (DP)](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) training for more than two years now. This method of multi-GPU training is slow and has many limitations that impact users, even more so in Lightning. Since DP has fallen out of fashion, and other strategies (DDP, DeepSpeed, etc.) have emerged without the same limitations, Lightning is now dropping DP completely.

Before:

python
import lightning as L

Enables DP, but with many limitations
trainer = L.Trainer(strategy="dp", devices=8)

Now:

python
import lightning as L

DDP is recommended for multi-GPU training
trainer = L.Trainer(strategy="ddp", devices=8)

If devices > 1, it selects ddp for you
trainer = L.Trainer(devices=8)

This is the same
trainer = L.Trainer(strategy="auto", devices=8)

Note that the DDP strategy now gets selected automatically when multiple devices are selected.
In Jupyter notebooks (Google Colab, Kaggle, etc.), Lightning will automatically select a fork-based DDP strategy (`strategy="ddp_notebook"`).

By extension, the LightningModule no longer has the `training_step_end()`, `validation_step_end()` and `test_step_end()` hooks because they were only used for reducing the outputs in DP. If you have other code that needs to run at the end of the step, you can migrate it to the corresponding `*_batch_end` hook for example.

Loop customization has a new face

Loop customization was our attempt to make the Trainer more customizable, hackable, and extensible. The modularity it brought had many pros, but it turned out that many users didn't need customization and favored readability and simplicity instead. Hence, the concept of loop customization was completely removed from 2.0.

For users who like build custom training loops, there is now a new paradigm with Lightning Fabric and "Build your own Trainer" (BYOT). Check out the [Fabric documentation](https://lightning.ai/docs/fabric/latest/) and the [super hackable, 500-lines trainer template](https://github.com/Lightning-AI/lightning/tree/master/examples/fabric/build_your_own_trainer).

Mixed precision overhaul ([16783](https://github.com/Lightning-AI/lightning/pull/16783))

Based on feedback, we decided to make the names for precision backends in `Trainer(precision=...)` clearer and less ambiguous. For example, the previous notation `Trainer(precision=16)` (which is still allowed to be used) suggested to some users that all of the weights and tensors would be stored in a `float16` format, which is not true. To solve this misunderstanding, we now distinguish these modes with "true" and "mixed" suffixes in the names:

| Recommended value | Short form |
|-----------------------------------|------------|
| "64-true" | "64", 64 |
| "32-true" | "32", 32 |
| "16-mixed" | "16", 16 |
| "bf16-mixed" | "bf16" |

All documentation and examples are now recommending the new, less ambiguous names.

Apex mixed precision gets replaced with AMP ([16149](https://github.com/Lightning-AI/lightning/pull/16149))

In addition to reworking the precision settings, we removed the [NVIDIA/Apex](https://github.com/NVIDIA/apex) integration which got deprecated in 1.9.0. Apex itself has deprecated the mixed precision module and recommends the native `torch.amp` module in PyTorch.
For Lightning, this means you should switch to `Trainer(precision="16-mixed")`.

Before:

python
import lightning as L

This required installing nvidia/apex
trainer = L.Trainer(amp_backend="apex", amp_level="O2")

Now:

python
import lightning as L

Rely on PyTorch's native mixed precision
trainer = L.Trainer(precision="16-mixed")

Native FSDP replaces Fairscale FSDP ([16400](https://github.com/Lightning-AI/lightning/pull/16400))

With the recent annoucement that FSDP becomes production ready in PyTorch 2.0, we are dropping the support for the experimental Fairscale version of FSDP and go all in on the native implementation instead.

Before:

python
import lightning as L

Short-hand names (fairscale)
trainer = L.Trainer(strategy="ddp_sharded" | "ddp_fully_sharded" | "fsdp")

Shorthand names (native PyTorch)
trainer = L.Trainer(strategy="fsdp_native" | "fsdp_native_full_shard_offload")

Or using the strategy instance
from lightning.pytorch.strategies import DDPShardedStrategy, DDPFullyShardedStrategy

trainer = L.Trainer(strategy=DDPShardedStrategy(...))
trainer = L.Trainer(strategy=DDPFullyShardedStrategy(...))

Now:

python
import lightning as L

Shorthand names (native PyTorch)
trainer = L.Trainer(strategy="fsdp" | "fsdp_cpu_offload")

Or using the strategy instance
from lightning.pytorch.strategies import FSDPStrategy

trainer = L.Trainer(strategy=FSDPStrategy(...))

Resuming from checkpoints ([16167](https://github.com/Lightning-AI/lightning/pull/16167))

Resuming from a checkpoint is no longer done by specifying the filename in the Trainer constructor. The `Trainer(resume_from_checkpoint=...)` argument was renamed and moved to the individual Trainer methods.

Before:

python
import lightning as L

trainer = L.Trainer(resume_from_checkpoint="/path/to/checkpoint.ckpt")
trainer.fit(model)
trainer.test(model)
...

Now:

python
import lightning as L

1. Remove `resume_from_checkpoint` from the Trainer
trainer = L.Trainer()

2. Add the path to any of the trainer methods you want to run
trainer.fit(model, ckpt_path="/path/to/checkpoint.ckpt")
trainer.test(model, ckpt_path="/path/to/checkpoint.ckpt")
...

We also added support for setting the checkpoint path statefully:

Now:

python
import lightning as L

trainer = L.Trainer()

set the checkpoint path with a setter
trainer.ckpt_path = "/path/to/checkpoint.ckpt"
trainer.fit(model)
remember to clear it before continuing
trainer.ckpt_path = None
trainer.test()
...

Logging the loss to the progress bar ([16192](https://github.com/Lightning-AI/lightning/issues/16192))

In previous versions of Lightning, the Trainer used to automatically compute a running mean of the training loss and log it to the progress bar. We are removing this feature to favor speed over automation. The loss will no longer appear in the progress bar unless the user explicitly adds it.

Before:

Epoch 8: 53%|█████ | 17/32 [5.13/s, v_num=2, loss=0.5643]

Now:

Epoch 8: 53%|█████ | 17/32 [5.13/s, v_num=2]

python
def training_step(self, batch, batch_idx):
...

Add this if you need the loss to be displayed in the progress bar
self.log("loss", loss, prog_bar=True, on_step=True)
return loss

The brittle argument parsing utilities ([16708](https://github.com/Lightning-AI/lightning/pull/16708))

In previous versions of Lightning, the `Trainer.add_argparse_args` and `Trainer.from_argparse_args` utility functions helped the user construct a commandline interface using Python's `argparse` library. Over time, new functionality was added to the Trainer that allowed most of the frequently used arguments to accept many different types. This made the arg-parsing utilities very brittle and limited in its scope. Ultimately, we decided to remove it in 2.0 in favor of more robust solutions like [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli.html) or third-party commandline tools ([Click](https://click.palletsprojects.com), [Hydra](https://hydra.cc), etc.).

Before:

python
import lightning as L
import argparse

parser = argparse.ArgumentParser()
parser = Trainer.add_argparse_args(parser)
args = parser.parse_args()
trainer = L.Trainer.from_argparse_args(args)
...

Now:

Example using the [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli.html):

python
from lightning.pytorch.cli import LightningCLI

LightningCLI(MyLitModel, MyLitDataModule)

Run it with `python train.py fit --trainer.max_epochs=2` for example.
Alternatively, you can add the argparse arguments you want manually:
python
import lightning as L
import argparse

parser = argparse.ArgumentParser()

1. Add the arguments you need
parser.add_argument("--accelerator", type=str, default="cuda")
args = parser.parse_args()

2. Pass them into the Trainer
trainer = L.Trainer(**vars(args))
...

Gradient norm tracking ([16745](https://github.com/Lightning-AI/lightning/pull/16745))

Until now, automatic tracking of the gradient norm was possible through the `Trainer(track_grad_norm=...)` argument. This functionality has now moved to a utility function and can be easily added in two lines of code to the LightningModule.

Before:

python
import lightning as L

trainer = L.Trainer(track_grad_norm=2)

Optionally customize logging in LightningModule
def log_grad_norm(self, grad_norm_dict):
self.log(...)

Now:

python
from lightning.pytorch.utilities import grad_norm

1. Override LightningModule hook
def on_before_optimizer_step(self, optimizer):
2. Inspect the (unscaled) gradients here
self.log_dict(grad_norm(self, norm_type=2))

This enables users to customize how the gradient norm is computed and logged, without needing to wrangle with the `Trainer` or override the `log_grad_norm` hook.

Speeding up DDP with `find_unused_parameters` ([16611](https://github.com/Lightning-AI/lightning/pull/16611))

When your model doesn't use all parameters in the forward-backward pass, PyTorch's DDP wrapper will freak out and inform you about it with an error. For this reason, and for the longest time, Lightning has set `find_unused_parameters=True` so that this error can be avoided. However, depending on the model, this can have significant performance impact (slowing down your training). With Lightning 2.0, we switch this flag back to `find_unused_parameters=False` (default) and favor speed over convenience.

Before:

python
import lightning as L

Previously, you had to manually override this for speed
trainer = L.Trainer(strategy="ddp_find_unused_parameters_false")

Now:
python
import lightning as L

Now, you get the best speed by default for all ddp variants
trainer = L.Trainer(strategy="ddp")
trainer = L.Trainer(strategy="ddp_spawn")

We now have these if you need them
trainer = L.Trainer(strategy="ddp_find_unused_parameters_true")

However, it can still happen that users run into problems with unused parametes in their model. Lightning now overrides PyTorch's default error message with a custom one to help users resolve the problem.

Sampler replacement in distributed strategies ([16829](https://github.com/Lightning-AI/lightning/pull/16829))

We renamed the `Trainer(replace_sampler_ddp=...)` argument to `Trainer(use_distributed_sampler=...)` to communicate that the sampler gets created not only for the DDP strategies, but all distributed strategies that need it. Its function is still the same as before, and most users don't need to change its default value.

Before:

python
import lightning as L

Hmm, I wonder, does this only apply to DDP?
trainer = L.Trainer(replace_sampler_ddp=True)

Now:

python
import lightning as L

Aha! Lightning uses a distributed sampler by default, got it!
trainer = L.Trainer(use_distributed_sampler=True)

PyTorch 1.10 and Python 3.7 no longer supported ([16492](https://github.com/Lightning-AI/lightning/pull/16492), [#16579](https://github.com/Lightning-AI/lightning/pull/16579))

Whenever there is a new PyTorch version, or a new Python version, it is time to say good bye to the oldest one we support. With the introduction of PyTorch 2.0, we are dropping support for PyTorch 1.10 to continue our support window of the four latest versions: 1.11, 1.12, 1.13 and 2.0. Similarly, with Lightning 2.0 we support the latest three versions of Python: 3.8, 3.9, and 3.10 (3.11 is coming soon).

If you are interested which range of PyTorch or Python versions a particular Lightning version supports, see our [compatibility matrix](https://lightning.ai/docs/pytorch/latest/versioning.html#compatibility-matrix).

Removed experimental fault-tolerance support ([16516](https://github.com/Lightning-AI/lightning/pull/16516), [#16533](https://github.com/Lightning-AI/lightning/pull/16533))

To simplify reading and debugging the codebase, we removed the experimental support for fault-tolerance which was under the `PL_FAULT_TOLERANT_TRAINING=` environment flag. We are looking at ways to re-implement this. If you are interested in this feature, don't hesistate to reach out to us or create a feature request.

Some of the features it included were now ported to stable APIs. One feature is the [new callback](https://github.com/Lightning-AI/lightning/pull/16512) to save a checkpoint on exception:

Now:

python
import lightning as L
from lightning.pytorch.callbacks import OnExceptionCheckpoint

on_exception = OnExceptionCheckpoint(".")
trainer = L.Trainer(callbacks=on_exception)

Another [feature](https://github.com/Lightning-AI/lightning/pull/16501) is automatic SIGTERM handling:

Now:

python
import lightning as L

trainer = L.Trainer()

if trainer.received_sigterm:
...

Removed support for `self.log()`ing a dictionary ([16389](https://github.com/Lightning-AI/lightning/pull/16389))

Our logging mechanism previously supported `log("key", {"something": 123})` (not using `log_dict`). However, this added significant complexity to the implementation with little benefit, as these keys could not be monitored by our Callbacks and most logger implementations do not support this notation. If you were using this feature with a compatible logger, you can still publish data directly to the Logger using `self.logger.log_metrics()`.

Removed `trainer.reset_*_dataloader()` methods ([16726](https://github.com/Lightning-AI/lightning/pull/16726))

These methods were not intended for public use as they could leave the Trainer in an undefined state. As a result, we have removed them. To achieve the same functionality, you can use the `Trainer(reload_dataloaders_every_n_epochs=...)` argument.

Removed the `Trainer(move_metrics_to_cpu=True)` argument ([16358](https://github.com/Lightning-AI/lightning/pull/16358))

This flag was designed to reduce device memory allocation at the end of an epoch. However, due to design issues, it could leave non-CPU runs in a non-functional state. Since the memory savings were minimal compared to other components and users can still manually control their metrics, we decided to remove this flag.

Separate the Gradient Accumulation Scheduler from Trainer ([16729](https://github.com/Lightning-AI/lightning/pull/16729))

We removed support for passing a scheduling dictionary to `Trainer(accumulate_grad_batches=...)`. The same functionality can still be achieved by simply passing the callback in. This simplifies the Trainer and the overall validation logic.

Before:
py
trainer = Trainer(accumulate_grad_batches={"1": 5, "10": 3})

Now:
py
from lightning.pytorch.callbacks import GradientAccumulationScheduler
trainer = Trainer(callbacks=GradientAccumulationScheduler({"1": 5, "10": 3}))

<a name="bc-changes-fabric"></a>
Fabric

LightningLite is dead, long live Lightning Fabric!

Over time, LightningLite has evolved from a simple onboarding tool to a powerful toolbox enabling users to build performant and hackable trainers. In 1.9.0, we renamed it to [Lightning Fabric](https://lightning.ai/docs/fabric/latest/) and gave users early access to its new features. In 2.0, we are dropping LightningLite from the package completely.

Before:

python
There were two different ways of importing Lite in <= 1.9.0
from pytorch_lighting.lite import LightningLite
from lightning.lite import LightningLite

You had to subclass LightningLite and implement `run()`
class MyTrainer(LightningLite):

def run(self):
...
self.backward(loss)
...

lite = LightningLite(...)
lite.run()

Now:

python
1. Import Fabric directly from the lightning package
import lightning as L

2. Instantiate Fabric directly, without subclassing
fabric = L.Fabric(...)

3. Use it in your training loop
fabric.backward(loss)
...

[Learn more about Fabric and what it can do in the new docs!](https://lightning.ai/docs/fabric/latest/)

<a name="changelog"></a>
CHANGELOG

<a name="changelog-pytorch"></a>
PyTorch

<details><summary>Added</summary>

- Added migration logic to warn about checkpoints with apex AMP state ([16161](https://github.com/Lightning-AI/lightning/pull/16161))
- Added the `Trainer.ckpt_path = ...` setter to statefully set the checkpoint path to load. This can act as a replacement for the removed `Trainer(resume_from_checkpoint=...)` flag ([16187](https://github.com/Lightning-AI/lightning/pull/16187))
- Added an argument `include_cuda` in `pytorch_lightning.utilities.seed.isolate_rng` to disable managing `torch.cuda`'s rng ([16423](https://github.com/Lightning-AI/lightning/pull/16423))
- Added `Tuner.lr_find(attr_name=...)` to specify custom learning rate attribute names ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
- Added an `OnExceptionCheckpoint` callback to save a checkpoint on exception ([16512](https://github.com/Lightning-AI/lightning/pull/16512))
- Added support for running the `MLFlowLogger` with the `mlflow-skinny` package ([16513](https://github.com/Lightning-AI/lightning/pull/16513))
- Added a `Trainer.received_sigterm` property to check whether a SIGTERM signal was received ([16501](https://github.com/Lightning-AI/lightning/pull/16501))
- Added support for cascading a SIGTERM signal to launched processes after the launching process (rank 0) receives it ([16525](https://github.com/Lightning-AI/lightning/pull/16525))
- Added a `kill` method to launchers to kill all launched processes ([16525](https://github.com/Lightning-AI/lightning/pull/16525))
- Added suffix option to DDP strategy names to enable `find_unused_parameters=True`, for example `strategy="ddp_find_unused_parameters_true"` ([16611](https://github.com/Lightning-AI/lightning/pull/16611))
- Added a new method `Strategy.on_exception` to the strategy base interface ([16646](https://github.com/Lightning-AI/lightning/pull/16646))
- Added support for `predict_step(dataloader_iter, batch_index)` ([16726](https://github.com/Lightning-AI/lightning/pull/16726))
- Added support for arbitrary iterables as dataloaders ([16726](https://github.com/Lightning-AI/lightning/pull/16726))
- Added "sequential" mode support to `CombinedLoader` to consume multiple iterables in sequence ([16743](https://github.com/Lightning-AI/lightning/pull/16743), [#16784](https://github.com/Lightning-AI/lightning/pull/16784))
- Added "max_size" mode support to `CombinedLoader` to consume multiple iterables entirely without cycling ([16939](https://github.com/Lightning-AI/lightning/pull/16939)
- Added a `Trainer(barebones=True)` argument where all features that may impact raw speed are disabled ([16854](https://github.com/Lightning-AI/lightning/pull/16854))
- Added support for writing logs remote file systems on `CSVLoggers`. ([16880](https://github.com/Lightning-AI/lightning/pull/16880))
- Added `DDPStrategy(start_method=...)` argument, defaulting to 'popen' ([16809](https://github.com/Lightning-AI/lightning/pull/16809))
- Added checks for whether the iterables used by the loops are valid ([17007](https://github.com/Lightning-AI/lightning/pull/17007))

</details>

<details><summary>Changed</summary>

- The Trainer's signal handlers are now registered for `trainer.{validate,test,predict}` ([17017](https://github.com/Lightning-AI/lightning/pull/17017))
- Renamed `ProgressBarBase` to `ProgressBar` ([17058](https://github.com/Lightning-AI/lightning/pull/17058))
- The `Trainer` now chooses `accelerator="auto", strategy="auto", devices="auto"` as defaults ([16847](https://github.com/Lightning-AI/lightning/pull/16847))
- "Native" suffix removal ([16490](https://github.com/Lightning-AI/lightning/pull/16490))
* `strategy="fsdp_native"` is now `strategy="fsdp"`
* `strategy="fsdp_native_full_shard_offload"` is now `strategy="fsdp_cpu_offload"`
* `pytorch_lightning.strategies.fully_sharded_native.DDPFullyShardedNativeStrategy` is now `pytorch_lightning.strategies.fsdp.FSDPStrategy`
* `pytorch_lightning.plugins.precision.fsdp_native_native_amp.FullyShardedNativeNativeMixedPrecisionPlugin` is now `pytorch_lightning.plugins.precision.fsdp.FSDPMixedPrecisionPlugin`
* `pytorch_lightning.plugins.precision.native_amp` is now `pytorch_lightning.plugins.precision.amp`
* `NativeSyncBatchNorm` is now `TorchSyncBatchNorm`
- Changed the default of `LearningRateFinder(update_attr=...)` and `Tuner.lr_find(update_attr=...)` to `True` ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
- Renamed the `pl.utilities.exceptions.GracefulExitException` to `SIGTERMException` ([16501](https://github.com/Lightning-AI/lightning/pull/16501))
- The `Callback.on_train_epoch_end` hook now runs after the `LightningModule.on_train_epoch_end` hook for instances of `EarlyStopping` and `Checkpoint` callbacks ([16567](https://github.com/Lightning-AI/lightning/pull/16567))
- The `LightningModule.{un}toggle_optimizer` methods no longer accept a `optimizer_idx` argument to select the relevant optimizer. Instead, the optimizer object can be passed in directly ([16560](https://github.com/Lightning-AI/lightning/pull/16560))
- Manual optimization is now required for working with multiple optimizers ([16539](https://github.com/Lightning-AI/lightning/pull/16539))
- DDP's `find_unused_parameters` now defaults to `False` ([16611](https://github.com/Lightning-AI/lightning/pull/16611))
- The strategy selected by `accelerator="hpu"` now defaults to `find_unused_parameters=False` ([16611](https://github.com/Lightning-AI/lightning/pull/16611))
- The main progress bar displayed during training no longer includes the combined progress for validation ([16695](https://github.com/Lightning-AI/lightning/pull/16695))
- Renamed `TQDMProgressBar.main_progress_bar` to `TQDMProgressBar.train_progress_bar` ([16695](https://github.com/Lightning-AI/lightning/pull/16695))
- Marked the progress tracking classes as protected ([17009](https://github.com/Lightning-AI/lightning/pull/17009))
- Marked the `lightning.pytorch.trainer.configuration_validator.verify_loop_configurations` function as protected ([17009](https://github.com/Lightning-AI/lightning/pull/17009))
- Marked the `lightning.pytorch.utiltiies.distirbuted.register_ddp_comm_hook` function as protected ([17009](https://github.com/Lightning-AI/lightning/pull/17009))
- Marked `lightning.pytorch.utilities.supporters.CombinedDataset` as protected ([16714](https://github.com/Lightning-AI/lightning/pull/16714))
- Marked the `{Accelerator,Signal,Callback,Checkpoint,Data,Logger}Connector` classes as protected ([17008](https://github.com/Lightning-AI/lightning/pull/17008))
- Marked the `lightning.pytorch.trainer.connectors.signal_connector.HandlersCompose` class as protected ([17008](https://github.com/Lightning-AI/lightning/pull/17008))
- Disabled strict loading in multiprocessing launcher ("ddp_spawn", etc.) when loading weights back into the main process ([16365](https://github.com/Lightning-AI/lightning/pull/16365))
- Renamed `CombinedLoader.loaders` to `CombinedLoader.iterables` ([16743](https://github.com/Lightning-AI/lightning/pull/16743))
- Renamed `Trainer(replace_sampler_ddp=...)` to `Trainer(use_distributed_sampler=...)` ([16829](https://github.com/Lightning-AI/lightning/pull/16829))
- Moved the `CombinedLoader` class from `lightning.pytorch.trainer.supporters` to `lightning.pytorch.combined_loader` ([16819](https://github.com/Lightning-AI/lightning/pull/16819))
- The top-level loops now own the data sources and combined dataloaders ([16726](https://github.com/Lightning-AI/lightning/pull/16726))
- The `trainer.*_dataloader` properties now return what the user returned in their `LightningModule.*_dataloader()` hook ([16726](https://github.com/Lightning-AI/lightning/pull/16726), [#16800](https://github.com/Lightning-AI/lightning/pull/16800))
- The `dataloader_idx` argument is now optional for the `on_{validation,test,predict}_batch_{start,end}` hooks. Remove it or default it to 0 if you don't use multiple dataloaders ([16753](https://github.com/Lightning-AI/lightning/pull/16753))
- Renamed `TPUSpawnStrategy` to `XLAStrategy` ([16781](https://github.com/Lightning-AI/lightning/pull/16781))
- Renamed `strategy='tpu_spawn'` to `strategy='xla'` and `strategy='tpu_spawn_debug'` to `strategy='xla_debug'` ([16781](https://github.com/Lightning-AI/lightning/pull/16781))
- Changed arguments for precision settings (from [64|32|16|bf16] to ["64-true"|"32-true"|"16-mixed"|"bf16-mixed"]) ([16783](https://github.com/Lightning-AI/lightning/pull/16783))
- When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set ([16780](https://github.com/Lightning-AI/lightning/pull/16780))
- The selection `Trainer(strategy="ddp_spawn", ...)` no longer falls back to "ddp" when a cluster environment gets detected ([16780](https://github.com/Lightning-AI/lightning/pull/16780))
- Predict's custom BatchSampler that tracks the batch indices no longer consumes the entire batch sampler at the beginning ([16826](https://github.com/Lightning-AI/lightning/pull/16826))
- Gradient norm tracking with `track_grad_norm` no longer rounds the norms to 4 digits, but instead logs them at full resolution ([16877](https://github.com/Lightning-AI/lightning/pull/16877))
- Merged the `DDPSpawnStrategy` into `DDPStrategy` ([16809](https://github.com/Lightning-AI/lightning/pull/16809))
- The `NeptuneLogger` now requires `neptune>=1.0.0` ([16888](https://github.com/Lightning-AI/lightning/pull/16888))
- Changed minimum supported version of `rich` from `10.14.0` to `12.13.0` ([16798](https://github.com/Lightning-AI/lightning/pull/16798))
- Removed the `lightning.pytorch.overrides.torch_distributed.broadcast_object_list` function ([17011](https://github.com/Lightning-AI/lightning/pull/17011))
- The `ServableModule` is now an abstract interface ([17000](https://github.com/Lightning-AI/lightning/pull/17000))
- The `psutil` package is now required for CPU monitoring ([17010](https://github.com/Lightning-AI/lightning/pull/17010))
- The Trainer no longer accepts positional arguments to ([17022](https://github.com/Lightning-AI/lightning/pull/17022))

</details>

<details><summary>Removed</summary>

- Removed support for PyTorch 1.10 ([16492](https://github.com/Lightning-AI/lightning/pull/16492))
- Removed support for Python 3.7 ([16579](https://github.com/Lightning-AI/lightning/pull/16579))
- Removed the `pytorch_lightning.lite` module in favor of `lightning_fabric` ([15953](https://github.com/Lightning-AI/lightning/pull/15953))
- `nvidia/apex` removal ([16149](https://github.com/Lightning-AI/lightning/pull/16149))
* Removed `pytorch_lightning.plugins.NativeMixedPrecisionPlugin` in favor of `pytorch_lightning.plugins.MixedPrecisionPlugin`
* Removed the `LightningModule.optimizer_step(using_native_amp=...)` argument
* Removed the `Trainer(amp_backend=...)` argument
* Removed the `Trainer.amp_backend` property
* Removed the `Trainer(amp_level=...)` argument
* Removed the `pytorch_lightning.plugins.ApexMixedPrecisionPlugin` class
* Removed the `pytorch_lightning.utilities.enums.AMPType` enum
* Removed the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
- Removed `Trainer(strategy='horovod')` support ([16150](https://github.com/Lightning-AI/lightning/pull/16150))
- `FairScale` removal (in favor of PyTorch's FSDP implementation) ([16400](https://github.com/Lightning-AI/lightning/pull/16400))
* Removed the `pytorch_lightning.overrides.fairscale.LightningShardedDataParallel` class
* Removed the `pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin` class
* Removed the `pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin` class
* Removed the `pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy` (fsdp) class
* Removed the `pytorch_lightning.strategies.sharded.DDPShardedStrategy` (ddp_sharded) class
* Removed the `pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy` (ddp_sharded_spawn) class
- Removed legacy device arguments in Trainer ([16171](https://github.com/Lightning-AI/lightning/pull/16171))
* Removed the `Trainer(gpus=...)` argument
* Removed the `Trainer(tpu_cores=...)` argument
* Removed the `Trainer(ipus=...)` argument
* Removed the `Trainer(num_processes=...)` argument
- Removed the deprecated `pytorch_lightning.utilities.AllGatherGrad` class ([16360](https://github.com/Lightning-AI/lightning/pull/16360))
- Removed the deprecated `resume_from_checkpoint` Trainer argument ([16167](https://github.com/Lightning-AI/lightning/pull/16167))
- Removed the deprecated `pytorch_lightning.profiler` module ([16359](https://github.com/Lightning-AI/lightning/pull/16359))
- Removed deadlock detection / process reconciliation (`PL_RECONCILE_PROCESS=1`) ([16204](https://github.com/Lightning-AI/lightning/pull/16204))
- Removed the `{training,validation,test}_epoch_end` hooks which would retain step outputs in memory. Alternative implementations are suggested by implementing their `on_*_epoch_end` hooks instead ([16520](https://github.com/Lightning-AI/lightning/pull/16520))
- Removed the `outputs` argument from the `on_predict_epoch_end` hook. You can access them via `trainer.predict_loop.predictions` ([16655](https://github.com/Lightning-AI/lightning/pull/16655))
- Removed support for the experimental `PL_FAULT_TOLERANT_TRAINING` environment flag ([16516](https://github.com/Lightning-AI/lightning/pull/16516), [#16533](https://github.com/Lightning-AI/lightning/pull/16533))
- Removed the deprecated `LightningCLI` arguments ([16380](https://github.com/Lightning-AI/lightning/pull/16380))
* `save_config_filename`
* `save_config_overwrite`
* `save_config_multifile`
* `description`
* `env_prefix`
* `env_parse`
- Removed the deprecated `pl.strategies.utils.on_colab_kaggle` function ([16437](https://github.com/Lightning-AI/lightning/pull/16437))
- Removed the deprecated code in:
* `pl.core.mixins` ([16424](https://github.com/Lightning-AI/lightning/pull/16424))
* `pl.utilities.distributed` ([16390](https://github.com/Lightning-AI/lightning/pull/16390))
* `pl.utilities.apply_func` ([16413](https://github.com/Lightning-AI/lightning/pull/16413))
* `pl.utilities.xla_device` ([16404](https://github.com/Lightning-AI/lightning/pull/16404))
* `pl.utilities.data` ([16440](https://github.com/Lightning-AI/lightning/pull/16440))
* `pl.utilities.device_parser` ([16412](https://github.com/Lightning-AI/lightning/pull/16412))
* `pl.utilities.optimizer` ([16439](https://github.com/Lightning-AI/lightning/pull/16439))
* `pl.utilities.seed` ([16422](https://github.com/Lightning-AI/lightning/pull/16422))
* `pl.utilities.cloud_io` ([16438](https://github.com/Lightning-AI/lightning/pull/16438))
- Removed the deprecated `Accelerator.setup_environment` method ([16436](https://github.com/Lightning-AI/lightning/pull/16436))
- Mark the `forward_module` argument as required ([16386](https://github.com/Lightning-AI/lightning/pull/16386))
* Removed the deprecated `pl_module` argument from the distributed module wrappers
* Removed the deprecated `pytorch_lightning.overrides.base.unwrap_lightning_module` function
* Removed the `pytorch_lightning.overrides.distributed.LightningDistributedModule` class
* Removed the deprecated `pytorch_lightning.overrides.fairscale.unwrap_lightning_module_sharded` function
* Removed the `pytorch_lightning.overrides.fairscale.LightningDistributedModule` class
- Removed the deprecated automatic GPU selection ([16184](https://github.com/Lightning-AI/lightning/pull/16184))
* Removed the `Trainer(auto_select_gpus=...)` argument
* Removed the `pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus}` functions
- Removed support for loop customization
* Removed `Loop.replace()` ([16361](https://github.com/Lightning-AI/lightning/pull/16361))
* Removed `Loop.connect()` ([16384](https://github.com/Lightning-AI/lightning/pull/16384))
* Removed the `trainer.{fit,validate,test,predict}_loop` properties ([16384](https://github.com/Lightning-AI/lightning/pull/16384))
* Removed the default `Loop.run()` implementation ([16384](https://github.com/Lightning-AI/lightning/pull/16384))
* The loop classes are now marked as protected ([16445](https://github.com/Lightning-AI/lightning/pull/16445))
* The fetching classes are now marked as protected ([16664](https://github.com/Lightning-AI/lightning/pull/16664))
- The `lightning.pytorch.overrides.distributed.IndexBatchSamplerWrapper` class is now marked as protected ([16826](https://github.com/Lightning-AI/lightning/pull/16826))
- Removed the `DataLoaderLoop`, `EvaluationEpochLoop`, and `PredictionEpochLoop` classes ([16726](https://github.com/Lightning-AI/lightning/pull/16726))
- Removed `trainer.reset_*_dataloader()` methods in favor of `Loop.setup_data()` for the top-level loops ([16726](https://github.com/Lightning-AI/lightning/pull/16726))
- Removed special support for truncated backpropagation through time (TBPTT) ([16172](https://github.com/Lightning-AI/lightning/pull/16172))
* Removed the `LightningModule.truncated_bptt_steps` attribute
* Removed the `LightningModule.tbptt_split_batch` hook
* The `LightningModule.training_step` no longer accepts a `hiddens` argument
* Removed the `pytorch_lightning.loops.batch.TrainingBatchLoop`
* Removed the `FitLoop.split_idx` property
* Removed the `LoggerConnector.on_train_split_start` method
- Removed the experimental `PL_INTER_BATCH_PARALLELISM` environment flag ([16355](https://github.com/Lightning-AI/lightning/pull/16355))
- Removed the `Trainer(move_metrics_to_cpu=True)` argument ([16358](https://github.com/Lightning-AI/lightning/pull/16358))
- Removed the `LightningModule.precision` attribute ([16203](https://github.com/Lightning-AI/lightning/pull/16203))
- Removed the automatic addition of a moving average of the `training_step` loss in the progress bar. Use `self.log("loss", ..., prog_bar=True)` instead. ([16192](https://github.com/Lightning-AI/lightning/issues/16192))
- Removed support for passing a dictionary value to `self.log()` ([16389](https://github.com/Lightning-AI/lightning/pull/16389))
- Removed `Trainer.model` setter ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
- Removed the argument `Trainer(multiple_trainloader_mode=...)`. You can use `CombinedLoader(..., mode=...)` directly now ([16800](https://github.com/Lightning-AI/lightning/pull/16800))
- Removed the unused `lightning.pytorch.utilities.finite_checks.print_nan_gradients` function ([16682](https://github.com/Lightning-AI/lightning/pull/16682))
- Removed the unused `lightning.pytorch.utilities.finite_checks.detect_nan_parameters` function ([16682](https://github.com/Lightning-AI/lightning/pull/16682))
- Removed the unused `lightning.pytorch.utilities.parsing.flatten_dict` function ([16744](https://github.com/Lightning-AI/lightning/pull/16744))
- Removed the unused `lightning.pytorch.utilities.metrics.metrics_to_scalars` function ([16681](https://github.com/Lightning-AI/lightning/pull/16681))
- Removed the unused `lightning.pytorch.utilities.supporters.{SharedCycleIteratorState,CombinedLoaderIterator}` classes ([16714](https://github.com/Lightning-AI/lightning/pull/16714))
- Tuner removal
* Removed the deprecated `trainer.tuning` property ([16379](https://github.com/Lightning-AI/lightning/pull/16379))
* Removed the deprecated `TrainerFn.TUNING` and `RunningStage.TUNING` enums ([16379](https://github.com/Lightning-AI/lightning/pull/16379))
* Removed `Trainer.tune()` in favor of `Tuner(trainer).{lr_find,scale_batch_size}` ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
* Removed `Trainer(auto_scale_batch_size=...)` in favor of `Tuner(trainer).scale_batch_size()` ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
* Removed `Trainer(auto_lr_find=...)` in favor of `Tuner(trainer).lr_find()` ([16462](https://github.com/Lightning-AI/lightning/pull/16462))
- Removed the `on_tpu` argument from `LightningModule.optimizer_step` hook ([16537](https://github.com/Lightning-AI/lightning/pull/16537))
- Removed the `using_lbfgs` argument from `LightningModule.optimizer_step` hook ([16538](https://github.com/Lightning-AI/lightning/pull/16538))
- Removed the `Trainer.data_parallel` property. Use `isinstance(trainer.strategy, ParallelStrategy)` instead ([16703](https://github.com/Lightning-AI/lightning/pull/16703))
- Removed the `Trainer.prediction_writer_callbacks` property ([16759](https://github.com/Lightning-AI/lightning/pull/16759))
- Removed support for multiple optimizers in automatic optimization mode ([16539](https://github.com/Lightning-AI/lightning/pull/16539))
* Removed `opt_idx` argument from `BaseFinetuning.finetune_function` callback method
* Removed `opt_idx` argument from `Callback.on_before_optimizer_step` callback method
* Removed `optimizer_idx` as an optional argument in `LightningModule.training_step`
* Removed `optimizer_idx` argument from `LightningModule.on_before_optimizer_step`
* Removed `optimizer_idx` argument from `LightningModule.configure_gradient_clipping`
* Removed `optimizer_idx` argument from `LightningModule.optimizer_step`
* Removed `optimizer_idx` argument from `LightningModule.optimizer_zero_grad`
* Removed `optimizer_idx` argument from `LightningModule.lr_scheduler_step`
* Removed support for declaring optimizer frequencies in the dictionary returned from `LightningModule.configure_optimizers`
* Removed arguments `optimizer` and `optimizer_idx` from `LightningModule.backward`
* Removed `optimizer_idx` argument from `PrecisionPlugin.optimizer_step` and all of its overrides in subclasses
* Removed `optimizer_idx` argument from `PrecisionPlugin.{optimizer_step,backward}` and all of its overrides in subclasses
* Removed `optimizer_idx` argument from `Strategy.{optimizer_step,backward}` and all of its overrides in subclasses
* Removed `Trainer.optimizer_frequencies` attribute
- Removed `Strategy.dispatch` ([16618](https://github.com/Lightning-AI/lightning/pull/16618))
- Removed `PrecisionPlugin.dispatch` ([16618](https://github.com/Lightning-AI/lightning/pull/16618))
- Removed legacy argparse utilities ([16708](https://github.com/Lightning-AI/lightning/pull/16708))
* Removed `LightningDataModule` methods: `add_argparse_args()`, `from_argparse_args()`, `parse_argparser()`, `get_init_arguments_and_types()`
* Removed class methods from Trainer: `default_attributes()`, `from_argparse_args()`, `parse_argparser()`, `match_env_arguments()`, `add_argparse_args()`
* Removed functions from `lightning.pytorch.utilities.argparse`: `from_argparse_args()`, `parse_argparser()`, `parse_env_variables()`, `get_init_arguments_and_types()`, `add_argparse_args()`
* Removed functions from `lightning.pytorch.utilities.parsing`: `import str_to_bool()`, `str_to_bool_or_int()`, `str_to_bool_or_str()`
- Removed support for passing a scheduling dictionary to `Trainer(accumulate_grad_batches=...)` ([16729](https://github.com/Lightning-AI/lightning/pull/16729))
- Removed support for `DataParallel` (`strategy='dp'`) and the `LightningParallelModule`-Wrapper, ([16748](https://github.com/Lightning-AI/lightning/pull/16748))
- Removed the unused `lightning.pytorch.utilities.supporters.{SharedCycleIteratorState,CombinedLoaderIterator}` classes ([16714](https://github.com/Lightning-AI/lightning/pull/16714))
- Removed `ProgressBarBase.{train_batch_idx,val_batch_idx,test_batch_idx,predict_batch_idx}` properties ([16760](https://github.com/Lightning-AI/lightning/pull/16760))
- Removed the `fit_loop.{min,max}_steps` setters ([16803](https://github.com/Lightning-AI/lightning/pull/16803))
- Removed the `Trainer(track_grad_norm=...)` argument ([16745](https://github.com/Lightning-AI/lightning/pull/16745))
- Removed the `LightningModule.log_grad_norm()` hook method ([16745](https://github.com/Lightning-AI/lightning/pull/16745))
- Removed the `QuantizationAwareTraining` callback ([16750](https://github.com/Lightning-AI/lightning/pull/16750))
- Removed the `ColossalAIStrategy` and `ColossalAIPrecisionPlugin` in favor of the new [lightning-colossalai](https://github.com/Lightning-AI/lightning-colossalai) package ([#16757](https://github.com/Lightning-AI/lightning/pull/16757), [#16778](https://github.com/Lightning-AI/lightning/pull/16778))
- Removed the `training_step_end`, `validation_step_end`, and `test_step_end` hooks from the `LightningModule` in favor of the `*_batch_end` hooks ([16791](https://github.com/Lightning-AI/lightning/pull/16791))
- Removed the `lightning.pytorch.strategies.DDPSpawnStrategy` in favor of `DDPStrategy(start_method='spawn')` (merged both classes) ([16809](https://github.com/Lightning-AI/lightning/pull/16809))
- Removed registration of `ShardedTensor` state dict hooks in `LightningModule.__init__` with `torch>=2.1` ([16892](https://github.com/Lightning-AI/lightning/pull/16892))
- Removed the `lightning.pytorch.core.saving.ModelIO` class interface ([16999](https://github.com/Lightning-AI/lightning/pull/16999))
- Removed the unused `lightning.pytorch.utilities.memory.get_model_size_mb` function ([17001](https://github.com/Lightning-AI/lightning/pull/17001))

</details>

<details><summary>Fixed</summary>

- Fixed an issue where `DistributedSampler.set_epoch` wasn't getting called during `trainer.predict` ([16785](https://github.com/Lightning-AI/lightning/pull/16785), [#16826](https://github.com/Lightning-AI/lightning/pull/16826))
- Fixed an issue with comparing torch versions when using a version of torch built from source ([17030](https://github.com/Lightning-AI/lightning/pull/17030))
- Improved the error message for installing tensorboard or tensorboardx ([17053](https://github.com/Lightning-AI/lightning/pull/17053))

</details>

<a name="changelog-fabric"></a>
Fabric

<details><summary>Added</summary>

- Added `Fabric.all_reduce` ([16459](https://github.com/Lightning-AI/lightning/pull/16459))
- Added support for saving and loading DeepSpeed checkpoints through `Fabric.save/load()` ([16452](https://github.com/Lightning-AI/lightning/pull/16452))
- Added support for automatically calling `set_epoch` on the `dataloader.batch_sampler.sampler` ([16841](https://github.com/Lightning-AI/lightning/pull/16841))
- Added support for writing logs to remote file systems with the `CSVLogger` ([16880](https://github.com/Lightning-AI/lightning/pull/16880))
- Added support for frozen dataclasses in the optimizer state ([16656](https://github.com/Lightning-AI/lightning/pull/16656))
- Added `lightning.fabric.is_wrapped` to check whether a module, optimizer, or dataloader was already wrapped by Fabric ([16953](https://github.com/Lightning-AI/lightning/pull/16953))

</details>

<details><summary>Changed</summary>

- Fabric now chooses `accelerator="auto", strategy="auto", devices="auto"` as defaults ([16842](https://github.com/Lightning-AI/lightning/pull/16842))
- Checkpoint saving and loading redesign ([16434](https://github.com/Lightning-AI/lightning/pull/16434))
* Changed the method signatrue of `Fabric.save` and `Fabric.load`
* Changed the method signature of `Strategy.save_checkpoint` and `Fabric.load_checkpoint`
* `Fabric.save` accepts a state that can contain model and optimizer references
* `Fabric.load` can now load state in-place onto models and optimizers
* `Fabric.load` returns a dictionary of objects that weren't loaded into the state
* `Strategy.save_checkpoint` and `Fabric.load_checkpoint` are now responsible for accessing the state of the model and optimizers
- `DataParallelStrategy.get_module_state_dict()` and `DDPStrategy.get_module_state_dict()` now correctly extracts the state dict without keys prefixed with 'module' ([16487](https://github.com/Lightning-AI/lightning/pull/16487))
- "Native" suffix removal ([16490](https://github.com/Lightning-AI/lightning/pull/16490))
* `strategy="fsdp_full_shard_offload"` is now `strategy="fsdp_cpu_offload"`
* `lightning.fabric.plugins.precision.native_amp` is now `lightning.fabric.plugins.precision.amp`
- Enabled all shorthand strategy names that can be supported in the CLI ([16485](https://github.com/Lightning-AI/lightning/pull/16485))
- Renamed `strategy='tpu_spawn'` to `strategy='xla'` and `strategy='tpu_spawn_debug'` to `strategy='xla_debug'` ([16781](https://github.com/Lightning-AI/lightning/pull/16781))
- Changed arguments for precision settings (from [64|32|16|bf16] to ["64-true"|"32-true"|"16-mixed"|"bf16-mixed"]) ([16767](https://github.com/Lightning-AI/lightning/pull/16767))
- The selection `Fabric(strategy="ddp_spawn", ...)` no longer falls back to "ddp" when a cluster environment gets detected ([16780](https://github.com/Lightning-AI/lightning/pull/16780))
- Renamed `setup_dataloaders(replace_sampler=...)` to `setup_dataloaders(use_distributed_sampler=...)` ([16829](https://github.com/Lightning-AI/lightning/pull/16829))

</details>

<details><summary>Removed</summary>

- Removed support for PyTorch 1.10 ([16492](https://github.com/Lightning-AI/lightning/pull/16492))
- Removed support for Python 3.7 ([16579](https://github.com/Lightning-AI/lightning/pull/16579))

</details>

<details><summary>Fixed</summary>

- Fixed issue where the wrapped dataloader `iter()` would be called twice ([16841](https://github.com/Lightning-AI/lightning/pull/16841))
- Improved the error message for installing tensorboard or tensorboardx ([17053](https://github.com/Lightning-AI/lightning/pull/17053))

</details>

<a name="changelog-app"></a>
App

<details><summary>Added</summary>

- Added `--zip` option to the `lightning cp` command to copy content from the Cloud Platform Filesystem as a zipfile

</details>

<details><summary>Changed</summary>

- Changed minimum supported version of `rich` from `10.14.0` to `12.13.0` ([16798](https://github.com/Lightning-AI/lightning/pull/16798))

</details>

<details><summary>Removed</summary>

- Removed support for Python 3.7 ([16579](https://github.com/Lightning-AI/lightning/pull/16579))

</details>

**Full commit list**: https://github.com/Lightning-AI/lightning/compare/1.9.0...2.0.0

<a name="contributors"></a>
Contributors

Veteran

aniketmaurya Atharva-Phatak awaelchli Borda carmocca dmitsf edenlightning hhsecond janpawlowskiof justusschock krshrimali leoleoasd martenlienen narJH27 rusmux SauravMaheshkar shenoynikhil tshu-w wouterzwerink

New

coreyjadams dconathan RuRo sergeevii123 BrianPulfer akkefa belerico tupini07 jihoonkim2100 eamonn-zh lightningforever mauvilsa muhammadanas0716 AdityaKane2001 dtuit themattinthehatt janpawlowskiof rusmux Erotemic janEbert

_If we forgot somebody or you have a suggestion, find [us on Discord](discord.gg/VptPCZkGNa) :zap:_

Did you know?

Chuck Norris jokes aren't actually jokes, they are the truth. Chuck Norris actually can do all the things people say he can.

2.0.0

Not secure

[Lightning AI](https://lightning.ai) is excited to announce the release of Lightning 2.0 :zap:

- [Highlights](highlights)
- [Backward Incompatible Changes](bc-changes)
- [PyTorch](bc-changes-pytorch)
- [Fabric](bc-changes-fabric)
- [Full Changelog](changelog)
- [PyTorch](changelog-pytorch)
- [Fabric](changelog-fabric)
- [App](changelog-app)
- [Contributors](contributors)

Over the last couple of years PyTorch Lightning has become the preferred deep learning framework for researchers and ML developers around the world, with close to 50 million downloads and 18k OSS projects, from top universities to leading labs.

With the help of over 800 contributors, we have added many features and functionalities to make it the most complete research toolkit possible, but some of these changes also introduced issues:

- API changes to the trainer
- Trainer code became harder to follow
- Many integrations made Lightning appear bloated
- The trainer became harder to customize / takes away what I instead need to tweak / have control over.

To make the research experience better, we are introducing 2.0:

- No API changes - *We commit to backward compatibility in the 2.0 series*
- Simplified abstraction layers, removed legacy functionality, integrations out of the main repo. This improves the project's readability and debugging experience.
- Introducing **Fabric**. Scale any PyTorch model with just a few lines of code. Read-on!

<a name="highlights"></a>
Highlights

PyTorch 2.0 and `torch.compile`

2.0.0rc0

Not secure

**Full Changelog**: https://github.com/Lightning-AI/lightning/compare/1.9.0...2.0.0rc0

1.9.5

Not secure

App

Changed

- Added `healthz` endpoint to plugin server (16882)
- System customization syncing for jobs run (16932)

---

Fabric

Changed

- Let `TorchCollective` works on the `torch.distributed` WORLD process group by default (16995)

Fixed

- fixed for all `_cuda_clearCublasWorkspaces` on teardown (16907)
- Improved the error message for installing tensorboard or tensorboardx (17053)

---

PyTorch

Changed

- Changed to the `NeptuneLogger` (16761):
* It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the `log()` method with `append()` and `extend()`.
* It now accepts a namespace `Handler` as an alternative to `Run` for the `run` argument. This means that you can call it like `NeptuneLogger(run=run["some/namespace"])` to log everything to the `some/namespace/` location of the run.
- Allow `sys.argv` and args in `LightningCLI` (16808)
- Moveed HPU broadcast override to the HPU strategy file (17011)

Depercated

- Removed registration of `ShardedTensor` state dict hooks in `LightningModule.__init__` with `torch>=2.1` (16892)
- Removed the `lightning.pytorch.core.saving.ModelIO` class interface (16974)

Fixed

- Fixed `num_nodes` not being set for `DDPFullyShardedNativeStrategy` (17160)
- Fixed parsing the precision config for inference in `DeepSpeedStrategy` (16973)
- Fixed the availability check for `rich` that prevented Lightning to be imported in Google Colab (17156)
- Fixed for all `_cuda_clearCublasWorkspaces` on teardown (16907)
- The `psutil` package is now required for CPU monitoring (17010)
- Improved the error message for installing tensorboard or tensorboardx (17053)

---

Contributors

awaelchli, belerico, carmocca, colehawkins, dmitsf, Erotemic, ethanwharris, kshitij12345, borda

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

1.9.4

Not secure

App

Removed

- Removed implicit ui testing with `testing.run_app_in_cloud` in favor of headless login and app selection ([16741](https://github.com/Lightning-AI/lightning/pull/16741))

---

Fabric

Added

- Added `Fabric(strategy="auto")` support ([16916](https://github.com/Lightning-AI/lightning/pull/16916))

Fixed

- Fixed edge cases in parsing device ids using NVML ([16795](https://github.com/Lightning-AI/lightning/pull/16795))
- Fixed DDP spawn hang on TPU Pods ([16844](https://github.com/Lightning-AI/lightning/pull/16844))
- Fixed an error when passing `find_usable_cuda_devices(num_devices=-1)` ([16866](https://github.com/Lightning-AI/lightning/pull/16866))

---

PyTorch

Added

- Added `Fabric(strategy="auto")` support. It will choose DDP over DDP-spawn, contrary to `strategy=None` (default) ([16916](https://github.com/Lightning-AI/lightning/pull/16916))

Fixed

- Fixed DDP spawn hang on TPU Pods ([16844](https://github.com/Lightning-AI/lightning/pull/16844))
- Fixed edge cases in parsing device ids using NVML ([16795](https://github.com/Lightning-AI/lightning/pull/16795))
- Fixed backwards compatibility for `lightning.pytorch.utilities.parsing.get_init_args` ([16851](https://github.com/Lightning-AI/lightning/pull/16851))

---

Contributors

ethanwharris, carmocca, awaelchli, justusschock , dtuit, Liyang90

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

1.9.3

Not secure

App

Fixed

- Fixed `lightning open` command and improved redirects ([16794](https://github.com/Lightning-AI/lightning/pull/16794))

---

Fabric

Fixed

- Fixed an issue causing a wrong environment plugin to be selected when `accelerator=tpu` and `devices > 1` ([16806](https://github.com/Lightning-AI/lightning/pull/16806))
- Fixed parsing of defaults for `--accelerator` and `--precision` in Fabric CLI when `accelerator` and `precision` are set to non-default values in the code ([16818](https://github.com/Lightning-AI/lightning/pull/16818))

---

PyTorch

Fixed

- Fixed an issue causing a wrong environment plugin to be selected when `accelerator=tpu` and `devices > 1` ([16806](https://github.com/Lightning-AI/lightning/pull/16806))

---

Contributors

ethanwharris, carmocca, awaelchli, borda, tchaton, yurijmikhalevich

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

Page 7 of 32

Releases

Has known vulnerabilities

Previous Next

Lightning

Page 7 of 32

2.0

2.0.0

2.0.0rc0

1.9.5

1.9.4

1.9.3

Page 7 of 32

Links

Releases