Lightning

Latest version: v2.5.1

Safety actively analyzes 723576 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 32

374.942

A module in PyTorch is always either in `train` (default) or `eval` mode.
This improvement should give users more visibility into the state of their model and help debug issues, for example when you need to make sure certain layers of the model are frozen.

<a name="highlights-forward-methods"></a>
Special Forward Methods in Fabric

Until now, Lightning Fabric warned the user in case the forward pass of the model or a subset of its modules was conducted through methods other than the dedicated `forward` method of the PyTorch module. The reason for this is that PyTorch needs to run special hooks in case of DDP/FSDP and other strategies to function properly, and not running through the real `forward` method would skip these hooks and lead to correctness issues.

In Lightning Fabric 2.3, we added a [feature to explicitly mark alternative forward methods](https://lightning.ai/docs/fabric/latest/api/wrappers.html#using-methods-other-than-forward-for-computation) so that Fabric can add the necessary rerouting behind the scenes:

python
import lightning as L

fabric = L.Fabric(devices=2, strategy="ddp")
fabric.launch()

model = MyModel()
model = fabric.setup(model)

OK: Calling the model directly
output = model(input)

ERROR: Calling another method that calls forward indirectly
prediction = model.generate(input)

New: Mark special forward methods explicitly before using them
model.mark_forward_method(model.generate)

OK: Now can use `model.generate()` in DDP/FSDP without issues
prediction = model.generate(input)

Find the [full example](https://lightning.ai/docs/fabric/latest/api/wrappers.html#using-methods-other-than-forward-for-computation) and more details in our docs.

<a name="bc-changes"></a>
Notable Changes

The 2.0 series of Lightning releases guarantees core API stability: No name changes, argument renaming, hook removals etc. on core interfaces (Trainer, LightningModule, etc.) unless a feature is specifically marked experimental. Here we list a few behavioral changes made in places where the change was justified if it significantly improves the user experience, improves performance, or fixes the correctness of a feature. These changes will likely not impact most users.

Skipping the training step in DDP

It is no longer allowed to skip `training_step()` by returning `None` in distributed training ([19918](https://github.com/Lightning-AI/pytorch-lightning/pull/19918)). The following usage was previously possible but would result in unpredictable hangs and timeouts in distributed training:

python
def training_step(self, batch):
loss = ...
if loss.isnan():
No longer allowed in multi-GPU!
Raises error in Lightning >= 2.3
return None
return loss

We decided to raise an error if the user attempts to return `None` when running in a multi-GPU setting.

Miscellaneous Changes

- Dropped support for PyTorch 1.13 ([19300](https://github.com/Lightning-AI/lightning/pull/19300)). With every new Lightning release, we add official support for the latest PyTorch stable version and drop the oldest version in our support window.
- The `prepare_data()` hook in `LightningModule` and `LightningDataModule` is now subject to a barrier without timeout to avoid long-running tasks to be interrupted ([19448](https://github.com/Lightning-AI/lightning/pull/19448)). Similarly, also in Fabric the `Fabric.rank_zero_first` context manager now uses an infinite barrier ([#19448](https://github.com/Lightning-AI/lightning/pull/19448)).

<a name="changelog"></a>
CHANGELOG

<a name="changelog-pytorch"></a>
PyTorch Lightning

<details><summary>Added</summary>

- The `ModelSummary` and `RichModelSummary` callbacks now display the training mode of each layer in the column "Mode" ([19468](https://github.com/Lightning-AI/lightning/pull/19468))
- Added `load_from_checkpoint` support for `LightningCLI` when using dependency injection ([18105](https://github.com/Lightning-AI/lightning/pull/18105))
- Added robust timer duration parsing with an informative error message when parsing fails ([19513](https://github.com/Lightning-AI/pytorch-lightning/pull/19513))
- Added `on_exception` hook to `LightningDataModule` ([19601](https://github.com/Lightning-AI/pytorch-lightning/pull/19601))
- Added support for PyTorch 2.3 ([19708](https://github.com/Lightning-AI/pytorch-lightning/pull/19708))
- Added `ModelParallelStrategy` to support 2D parallelism ([19878](https://github.com/Lightning-AI/pytorch-lightning/pull/19878), [#19888](https://github.com/Lightning-AI/pytorch-lightning/pull/19888))
- Added a call to `torch.distributed.destroy_process_group` in atexit handler if process group needs destruction ([19931](https://github.com/Lightning-AI/pytorch-lightning/pull/19931))
- Added support for configuring hybrid-sharding by passing a tuple for the `FSDPStrategy(device_mesh=...)` argument ([19504](https://github.com/Lightning-AI/pytorch-lightning/pull/19504))

</details>

<details><summary>Changed</summary>

- The `prepare_data()` hook in `LightningModule` and `LightningDataModule` is now subject to a barrier without timeout to avoid long-running tasks to be interrupted ([19448](https://github.com/Lightning-AI/lightning/pull/19448))
- Relaxed the requirement for custom batch samplers to expose `drop_last` for prediction ([19678](https://github.com/Lightning-AI/pytorch-lightning/pull/19678))
- It is no longer allowed to skip `training_step()` by returning `None` in distributed training ([19918](https://github.com/Lightning-AI/pytorch-lightning/pull/19918))

</details>

<details><summary>Removed</summary>

- Removed the Bagua integration (`Trainer(strategy="bagua")`) ([19445](https://github.com/Lightning-AI/lightning/pull/19445))
- Removed support for PyTorch 1.13 ([19706](https://github.com/Lightning-AI/lightning/pull/19706))

</details>

<details><summary>Fixed</summary>

- Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) ([19886](https://github.com/Lightning-AI/lightning/pull/19886))
- Fixed `WandbLogger.log_hyperparameters()` raising an error if hyperparameters are not JSON serializable ([19769](https://github.com/Lightning-AI/pytorch-lightning/pull/19769))
- Fixed an issue with the LightningCLI not being able to set the `ModelCheckpoint(save_last=...)` argument ([19808](https://github.com/Lightning-AI/pytorch-lightning/pull/19808))
- Fixed an issue causing ValueError for certain object such as TorchMetrics when dumping hyperparameters to YAML ([19804](https://github.com/Lightning-AI/pytorch-lightning/pull/19804))
- Fixed resetting `epoch_loop.restarting` to avoid full validation run after `LearningRateFinder` ([19818](https://github.com/Lightning-AI/pytorch-lightning/issues/19818))

</details>

<a name="changelog-fabric"></a>
Lightning Fabric

<details><summary>Added</summary>

- Added sanitization for classes before logging them as hyperparameters ([19771](https://github.com/Lightning-AI/pytorch-lightning/pull/19771))
- Enabled consolidating distributed checkpoints through `fabric consolidate` in the new CLI ([19560](https://github.com/Lightning-AI/pytorch-lightning/pull/19560))
- Added the ability to explicitly mark forward methods in Fabric via `_FabricModule.mark_forward_method()` ([19690](https://github.com/Lightning-AI/pytorch-lightning/pull/19690))
- Added support for PyTorch 2.3 ([19708](https://github.com/Lightning-AI/pytorch-lightning/pull/19708))
- Added `ModelParallelStrategy` to support 2D parallelism ([19846](https://github.com/Lightning-AI/pytorch-lightning/pull/19846), [#19852](https://github.com/Lightning-AI/pytorch-lightning/pull/19852), [#19870](https://github.com/Lightning-AI/pytorch-lightning/pull/19870), [#19872](https://github.com/Lightning-AI/pytorch-lightning/pull/19872))
- Added a call to `torch.distributed.destroy_process_group` in atexit handler if process group needs destruction ([19931](https://github.com/Lightning-AI/pytorch-lightning/pull/19931))
- Added support for configuring hybrid-sharding by passing a tuple for the `FSDPStrategy(device_mesh=...)` argument ([19504](https://github.com/Lightning-AI/pytorch-lightning/pull/19504))

</details>

<details><summary>Changed</summary>

- Renamed `lightning run model` to `fabric run` ([19442](https://github.com/Lightning-AI/pytorch-lightning/pull/19442), [#19527](https://github.com/Lightning-AI/pytorch-lightning/pull/19527))
- The `Fabric.rank_zero_first` context manager now uses a barrier without timeout to avoid long-running tasks to be interrupted ([19448](https://github.com/Lightning-AI/lightning/pull/19448))
- Fabric now raises an error if you forget to call `fabric.backward()` when it is needed by the strategy or precision selection ([19447](https://github.com/Lightning-AI/lightning/pull/19447), [#19493](https://github.com/Lightning-AI/lightning/pull/19493))
- `_BackwardSyncControl` can now control what to do when gradient accumulation is disabled ([19577](https://github.com/Lightning-AI/lightning/pull/19577))

</details>

<details><summary>Removed</summary>

- Removed support for PyTorch 1.13 ([19706](https://github.com/Lightning-AI/lightning/pull/19706))

</details>

<details><summary>Fixed</summary>

- Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) ([19886](https://github.com/Lightning-AI/lightning/pull/19886))

</details>

</br>

**Full commit list**: [2.2.0 -> 2.3.0](https://github.com/Lightning-AI/lightning/compare/2.2.0...2.3.0)

<a name="contributors"></a>
Contributors

We thank all our contributors who submitted pull requests for features, bug fixes and documentation updates.

New Contributors
* cauyxy made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19437
* mwip made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19518
* kylebgorman made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19513
* kashif made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19520
* ash0ts made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19451
* dimitri-voytan made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19524
* ankitgola005 made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19615
* invisprints made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19629
* kvenkman made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19465
* fnhirwa made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19640
* inyong37 made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19677
* clumsy made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19601
* judidoko made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19692
* Lunamos made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19701
* dominicgkerr made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19727
* daavoo made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19774
* Peiffap made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19805
* IvanYashchuk made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19926
* ringohoffman made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19904
* afspies made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19847
* fedebotu made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19822
* mariovas3 made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19808
* Bhavay-2001 made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19947
* V0XNIHILI made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19771

Did you know?

Chuck Norris is a big fan and daily user of Lightning Studio.

93.7

2.5.1

Changes

<a name="changelog-pytorch"></a>

PyTorch Lightning

<details open><summary>Changed</summary>

- Allow LightningCLI to use a customized argument parser class ([20596](https://github.com/Lightning-AI/pytorch-lightning/pull/20596))
- Change `wandb` default x-axis to `tensorboard`'s `global_step` when `sync_tensorboard=True` ([20611](https://github.com/Lightning-AI/pytorch-lightning/pull/20611))
- Added a new `checkpoint_path_prefix` parameter to the MLflow logger which can control the path to where the MLflow artifacts for the model checkpoints are stored ([20538](https://github.com/Lightning-AI/pytorch-lightning/pull/20538))
- CometML logger was updated to support the recent Comet SDK ([20275](https://github.com/Lightning-AI/pytorch-lightning/pull/20275))
- bump: testing with latest `torch` 2.6 ([20509](https://github.com/Lightning-AI/pytorch-lightning/pull/20509))

</details>

<details open><summary>Fixed</summary>

- Fixed CSVLogger logging hyperparameter at every write which increases latency ([20594](https://github.com/Lightning-AI/pytorch-lightning/pull/20594))
- Fixed OverflowError when resuming from checkpoint with an iterable dataset ([20565](https://github.com/Lightning-AI/pytorch-lightning/issues/20565))
- Fixed swapped `_R_co` and `_P` to prevent type error ([20508](https://github.com/Lightning-AI/pytorch-lightning/issues/20508))
- Always call `WandbLogger.experiment` first in `_call_setup_hook` to ensure `tensorboard` logs can sync to `wandb` ([20610](https://github.com/Lightning-AI/pytorch-lightning/pull/20610))
- Fixed TBPTT example ([20528](https://github.com/Lightning-AI/pytorch-lightning/pull/20528))
- Fixed test compatibility as AdamW became a subclass of Adam ([20574](https://github.com/Lightning-AI/pytorch-lightning/pull/20574))
- Fixed file extension of model checkpoints uploaded by NeptuneLogger ([20581](https://github.com/Lightning-AI/pytorch-lightning/pull/20581))
- Reset trainer variable `should_stop` when `fit` is called ([19177](https://github.com/Lightning-AI/pytorch-lightning/pull/19177))
- Fixed making `WandbLogger` upload models from all `ModelCheckpoint` callbacks, not just one ([20191](https://github.com/Lightning-AI/pytorch-lightning/pull/20191))
- Error when logging to MLFlow deleted experiment ([20556](https://github.com/Lightning-AI/pytorch-lightning/pull/20556))

</details>

<a name="changelog-fabric"></a>

Lightning Fabric

<details open><summary>Changed</summary>

- Added logging support for a list of dicts without collapsing to a single key ([19957](https://github.com/Lightning-AI/pytorch-lightning/issues/19957))
- bump: testing with latest `torch` 2.6 ([20509](https://github.com/Lightning-AI/pytorch-lightning/pull/20509))

</details>

<details open><summary>Removed</summary>

- Removed legacy support for `lightning run model`; use `fabric run` instead. ([20588](https://github.com/Lightning-AI/pytorch-lightning/pull/20588))

</details>

</br>

**Full commit list**: [2.5.0 -> 2.5.1](https://github.com/Lightning-AI/pytorch-lightning/compare/2.5.0...2.5.1)

<a name="contributors"></a>

Contributors

We thank **all folks** who submitted issues, features, fixes and doc changes. It's the only way we can **collectively** make Lightning :zap: better for everyone, nice job!

In particular, we would like to thank the authors of the pull-requests above, in no particular order:

benglewis, Borda, cgebbe, duydl, haifeng-jin, japdubengsub, justusschock, lantiga, mauvilsa, millskyle, ringohoffman, ryan597, senarvi, TresYap

Thank you :heart: and we hope you'll keep them coming!

2.5.0.post0

**Full Changelog**: https://github.com/Lightning-AI/pytorch-lightning/compare/2.5.0...2.5.0.post0

2.5.0

[Lightning AI](https://lightning.ai) :zap: is excited to announce the release of Lightning 2.5.

Lightning 2.5 comes with improvements on several fronts, with **zero** API changes. Our users love it stable, we keep it stable :smile:.

Talking about love :heart:, the `lightning`, `pytorch-lightning` and `lightning-fabric` packages are collectively getting more than **10M downloads per month** :open_mouth:, for a total of over **180M downloads** :exploding_head: since the early days . It's incredible to see PyTorch Lightning getting such a strong adoption across the industry and the sciences.

Release 2.5 embraces PyTorch 2.5, and it marks some of its more recent directions as officially supported, namely tensor subclass-based APIs like [Distributed Tensors](https://pytorch.org/docs/stable/distributed.tensor.html) and [TorchAO](https://pytorch.org/blog/pytorch-native-architecture-optimization/), in combination with `torch.compile`.

Here's a couple of examples:

<details><summary>Distributed FP8 transformer with PyTorch Lightning</summary>

Full example [here](https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/pytorch/fp8_distributed_transformer)

python
import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
from lightning.pytorch.demos import Transformer, WikiText2
from lightning.pytorch.strategies import ModelParallelStrategy
from torch.distributed._composable.fsdp.fully_shard import fully_shard
from torch.utils.data import DataLoader
from torchao.float8 import Float8LinearConfig, convert_to_float8_training

class LanguageModel(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.vocab_size = vocab_size
self.model = None

def configure_model(self):
if self.model is not None:
return

with torch.device("meta"):
model = Transformer(
vocab_size=self.vocab_size,
nlayers=16,
nhid=4096,
ninp=1024,
nhead=32,
)

float8_config = Float8LinearConfig(
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly # noqa
pad_inner_dim=True,
)

def module_filter_fn(mod: torch.nn.Module, fqn: str):
we skip the decoder because it typically vocabulary size
is not divisible by 16 as required by float8
return fqn != "decoder"

convert_to_float8_training(model, config=float8_config, module_filter_fn=module_filter_fn)

for module in model.modules():
if isinstance(module, (nn.TransformerEncoderLayer, nn.TransformerDecoderLayer)):
fully_shard(module, mesh=self.device_mesh)

fully_shard(model, mesh=self.device_mesh)

self.model = torch.compile(model)

def training_step(self, batch):
input, target = batch
output = self.model(input, target)
loss = F.nll_loss(output, target.view(-1))
self.log("train_loss", loss, prog_bar=True)
return loss

def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-4)

def train():
L.seed_everything(42)

dataset = WikiText2()
train_dataloader = DataLoader(dataset, num_workers=8, batch_size=1)

model = LanguageModel(vocab_size=dataset.vocab_size)

mp_strategy = ModelParallelStrategy(
data_parallel_size=4,
tensor_parallel_size=1,
)

trainer = L.Trainer(strategy=mp_strategy, max_steps=100, precision="bf16-true", accumulate_grad_batches=8)

trainer.fit(model, train_dataloader)

trainer.print(torch.cuda.memory_summary())

if __name__ == "__main__":
torch.set_float32_matmul_precision("high")

train()

</details>

<details><summary>Distributed FP8 transformer with Fabric</summary>

Full example [here](https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples/fabric/fp8_distributed_transformer)

python
import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
from lightning.fabric.strategies import ModelParallelStrategy
from lightning.pytorch.demos import Transformer, WikiText2
from torch.distributed._composable.fsdp.fully_shard import fully_shard
from torch.distributed.device_mesh import DeviceMesh
from torch.utils.data import DataLoader
from torchao.float8 import Float8LinearConfig, convert_to_float8_training
from tqdm import tqdm

def configure_model(model: nn.Module, device_mesh: DeviceMesh) -> nn.Module:
float8_config = Float8LinearConfig(
pip install -U --index-url <https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/> triton-nightly # noqa
pad_inner_dim=True,
)

def module_filter_fn(mod: torch.nn.Module, fqn: str):
we skip the decoder because it typically vocabulary size
is not divisible by 16 as required by float8
return fqn != "decoder"

convert_to_float8_training(model, config=float8_config, module_filter_fn=module_filter_fn)

for module in model.modules():
if isinstance(module, (torch.nn.TransformerEncoderLayer, torch.nn.TransformerDecoderLayer)):
fully_shard(module, mesh=device_mesh)

fully_shard(model, mesh=device_mesh)

return torch.compile(model)

def train():
L.seed_everything(42)

batch_size = 8
micro_batch_size = 1

max_steps = 100

dataset = WikiText2()
dataloader = DataLoader(dataset, num_workers=8, batch_size=micro_batch_size)

with torch.device("meta"):
model = Transformer(
vocab_size=dataset.vocab_size,
nlayers=16,
nhid=4096,
ninp=1024,
nhead=32,
)

strategy = ModelParallelStrategy(data_parallel_size=4, tensor_parallel_size=1, parallelize_fn=configure_model)

fabric = L.Fabric(precision="bf16-true", strategy=strategy)
fabric.launch()

model = fabric.setup(model)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
optimizer = fabric.setup_optimizers(optimizer)

dataloader = fabric.setup_dataloaders(dataloader)

iterable = tqdm(enumerate(dataloader), total=len(dataloader)) if fabric.is_global_zero else enumerate(dataloader)

steps = 0

for i, batch in iterable:
input, target = batch

is_accumulating = i % (batch_size // micro_batch_size) != 0

with fabric.no_backward_sync(model, enabled=is_accumulating):
output = model(input, target)
loss = F.nll_loss(output, target.view(-1))
fabric.backward(loss)

if not is_accumulating:
fabric.clip_gradients(model, optimizer, max_norm=1.0)
optimizer.step()
optimizer.zero_grad()
steps += 1

if fabric.is_global_zero:
iterable.set_postfix_str(f"train_loss={loss.item():.2f}")

if steps == max_steps:
break

fabric.print(torch.cuda.memory_summary())

if __name__ == "__main__":
torch.set_float32_matmul_precision("high")

train()

</details>

As these examples show, it's now easier than ever to take your PyTorch Lightning module and run it with **FSDP2 and/or tensor parallelism in FP8 precision**, using the `ModelParallelStrategy` we introduced in 2.4.

Also note the use of distributed tensor APIs, TorchAO APIs, and `torch.compile` directly in the `configure_model` hook (or in the parallelize function in Fabric's `ModelParallelStrategy`), as opposed to the `LightningModule` as a whole. The advantage with this approach is that you can just **copy-paste the parallelize functions** that come with native PyTorch models directly in `configure_model` and get the same effect, no head-scratching involved :nerd_face:.

Talking about head scratching, we also made a pass at the PyTorch Lightning internals and **hardened** the parts where we keep track of **progress counters** during training, validation, testing, as well as learning rate scheduling, in relation to **resuming from checkpoints**. We now made sure there are no (to the best of our knowledge) edge cases where stopping and resuming from checkpoints can change the sequence of loops or other internal states. **Fault tolerance for the win** :partying_face:!

Alright! Feel free to take a look at the **full changelog** below.

And of course: the best way to use PyTorch Lightning and Fabric is through [Lightning Studio](https://lightning.ai/) :zap:. Access GPUs, train models, deploy and more with **zero setup**. Focus on data and models - not infrastructure.

<a name="changelog"></a>

Changes

<a name="changelog-pytorch"></a>

PyTorch Lightning

<details open><summary>Added</summary>

- Added `step` parameter to `TensorBoardLogger.log_hyperparams` to visualize changes during training ([20176](https://github.com/Lightning-AI/pytorch-lightning/pull/20176))
- Added `str` method to datamodule ([20301](https://github.com/Lightning-AI/pytorch-lightning/pull/20301))
- Added timeout to DeepSpeedStrategy ([20474](https://github.com/Lightning-AI/pytorch-lightning/pull/20474))
- Added doc for Truncated Back-Propagation Through Time ([20422](https://github.com/Lightning-AI/pytorch-lightning/pull/20422))
- Added FP8 + FSDP2 + torch.compile examples for PyTorch Lightning ([20440](https://github.com/Lightning-AI/pytorch-lightning/pull/20440))
- Added profiling to `Trainer.save_checkpoint` ([20405](https://github.com/Lightning-AI/pytorch-lightning/pull/20405))
- Added after_instantiate_classes hook to CLI ([20401](https://github.com/Lightning-AI/pytorch-lightning/pull/20401))

</details>

<details open><summary>Changed</summary>

- Updated checkpointing documentation to mark `resume_from_checkpoint` as deprecated ([20477](https://github.com/Lightning-AI/pytorch-lightning/pull/20477))
- Made plugin type checks more flexible ([20186](https://github.com/Lightning-AI/pytorch-lightning/pull/20186))
- Changed seeding NumPy using `np.random.SeedSequence()` in `pl_worker_init_function()` to robustly seed NumPy-dependent dataloader workers ([20369](https://github.com/Lightning-AI/pytorch-lightning/pull/20369))
- Allowed callbacks to be restored not just during training ([20403](https://github.com/Lightning-AI/pytorch-lightning/pull/20403))
- Changed LightningCLI tests to account for future fix in jsonargparse ([20372](https://github.com/Lightning-AI/pytorch-lightning/pull/20372))
- Bumped PyTorch to version `2.5` ([20351](https://github.com/Lightning-AI/pytorch-lightning/pull/20351))
- Decoupled checkpoint artifact path from model artifact path ([20325](https://github.com/Lightning-AI/pytorch-lightning/pull/20325))
- Updated BitsAndBytes version ([20313](https://github.com/Lightning-AI/pytorch-lightning/pull/20313))
- Changed merging of hparams when logging to ignore parameter names that start with an underscore `_` ([20221](https://github.com/Lightning-AI/pytorch-lightning/pull/20221))
- Re-enabled passing `BytesIO` as path in `.to_onnx()` ([20172](https://github.com/Lightning-AI/pytorch-lightning/pull/20172))

</details>

<details open><summary>Removed</summary>

- Removed `List[int]` as input type for Trainer when `accelerator="cpu"` ([20399](https://github.com/Lightning-AI/pytorch-lightning/pull/20399))

</details>

<details open><summary>Fixed</summary>

- Fixed UnboundLocalError when using the predict method with return_predictions=False. ([20484](https://github.com/Lightning-AI/pytorch-lightning/pull/20484))
- Fixed use of `convert_module` in FSDP to avoid using more memory than necessary during initialization ([20323](https://github.com/Lightning-AI/pytorch-lightning/pull/20323))
- Fixed TypeError in `configure_optimizers` when running with `ReduceLROnPlateau` ([20471](https://github.com/Lightning-AI/pytorch-lightning/pull/20471))
- Fixed return type in `configure_optimizers` example ([20420](https://github.com/Lightning-AI/pytorch-lightning/pull/20420))
- Fixed in ncorrect URI prefix stripping in MLFlowLogger ([20365](https://github.com/Lightning-AI/pytorch-lightning/pull/20365))
- Fixed shuffling behavior when using a custom sampler in data module ([20327](https://github.com/Lightning-AI/pytorch-lightning/pull/20327))
- Ensured restarting from checkpoints leads to consistent internal counters compared to uninterrupted training ([20379](https://github.com/Lightning-AI/pytorch-lightning/pull/20379))
- Fixed LightningCLI failing when both module and data module save hyperparameters due to conflicting internal `_class_path` parameter ([20221](https://github.com/Lightning-AI/pytorch-lightning/pull/20221))

</details>

<a name="changelog-fabric"></a>

Lightning Fabric

<details open><summary>Added</summary>

- Added `step` parameter to `TensorBoardLogger.log_hyperparams` to visualize changes during training ([20176](https://github.com/Lightning-AI/pytorch-lightning/pull/20176))
- Added timeout to DeepSpeedStrategy ([20474](https://github.com/Lightning-AI/pytorch-lightning/pull/20474))
- Added FP8 + FSDP2 + torch.compile examples for Fabric ([20440](https://github.com/Lightning-AI/pytorch-lightning/pull/20440))
- Added RTX 4080 super to chips dictionary ([20285](https://github.com/Lightning-AI/pytorch-lightning/pull/20285))
- Added device property to lazy load functionality ([20183](https://github.com/Lightning-AI/pytorch-lightning/pull/20183))
- Added `ddp_find_unused_parameters_true` alias in Fabric's DDPStrategy ([20125](https://github.com/Lightning-AI/pytorch-lightning/pull/20125))

</details>

<details open><summary>Changed</summary>

- Changed seeding NumPy using `np.random.SeedSequence()` in `pl_worker_init_function()` to robustly seed NumPy-dependent dataloader workers ([20369](https://github.com/Lightning-AI/pytorch-lightning/pull/20369))
- Bumped PyTorch to version `2.5` ([20351](https://github.com/Lightning-AI/pytorch-lightning/pull/20351))
- Update BitsAndBytes version ([20313](https://github.com/Lightning-AI/pytorch-lightning/pull/20313))

</details>

<details open><summary>Removed</summary>

- Nothing to see here :smile:

</details>

<details open><summary>Fixed</summary>

- Fixed use of `convert_module` in FSDP to avoid using more memory than necessary during initialization ([20323](https://github.com/Lightning-AI/pytorch-lightning/pull/20323))

</details>

</br>

**Full commit list**: [2.4.0 -> 2.5.0](https://github.com/Lightning-AI/pytorch-lightning/compare/2.4.0...2.5.0)

<a name="contributors"></a>

Contributors

We thank **all folks** who submitted issues, features, fixes and doc changes. It's the only way we can **collectively** make Lightning :zap: better for everyone, nice job!

In particular, we would like to thank the authors of the pull-requests above, in no particular order:

ringohoffman MrWhatZitToYaa jedyang97 chualanagit lantiga AlessandroW kazuar t-vi 01AbhiSingh WangYue0000 amorehead EricCousineau-TRI mauvilsa Borda pete-mcelroy ali-alshaar7 GdoongMathew farhadrgh tshu-w LukasSalchow awindmann dadwadw233 qingquansong

Thank you :heart: and we hope you'll keep them coming!

2.5.0rc0

Page 1 of 32

Releases

Has known vulnerabilities

Lightning

Page 1 of 32

374.942

93.7

2.5.1

2.5.0.post0

2.5.0

2.5.0rc0

Page 1 of 32

Links

Releases