Lightning

Latest version: v2.5.1

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 32

2.4.0

[Lightning AI](https://lightning.ai) :zap: is excited to announce the release of Lightning 2.4. This is mainly a compatibility upgrade for PyTorch 2.4 and Python 3.12, with a sprinkle of a few features and bug fixes.

**Did you know?** The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you [Lightning Studio](https://lightning.ai/). Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.

<a name="changelog"></a>
Changes

<a name="changelog-pytorch"></a>
PyTorch Lightning

<details open><summary>Added</summary>

- Made saving non-distributed checkpoints fully atomic ([20011](https://github.com/Lightning-AI/pytorch-lightning/pull/20011))
- Added `dump_stats` flag to `AdvancedProfiler` ([19703](https://github.com/Lightning-AI/pytorch-lightning/issues/19703))
- Added a flag `verbose` to the `seed_everything()` function ([20108](https://github.com/Lightning-AI/pytorch-lightning/pull/20108))
- Added support for PyTorch 2.4 ([20010](https://github.com/Lightning-AI/pytorch-lightning/pull/20010))
- Added support for Python 3.12 ([20078](https://github.com/Lightning-AI/pytorch-lightning/pull/20078))
- The `TQDMProgressBar` now provides an option to retain prior training epoch bars ([19578](https://github.com/Lightning-AI/pytorch-lightning/pull/19578))
- Added the count of modules in train and eval mode to the printed `ModelSummary` table ([20159](https://github.com/Lightning-AI/pytorch-lightning/pull/20159))

</details>

<details open><summary>Changed</summary>

- Triggering KeyboardInterrupt (Ctrl+C) during `.fit()`, `.evaluate()`, `.test()` or `.predict()` now terminates all processes launched by the Trainer and exits the program ([19976](https://github.com/Lightning-AI/pytorch-lightning/pull/19976))
- Changed the implementation of how seeds are chosen for dataloader workers when using `seed_everything(..., workers=True)` ([20055](https://github.com/Lightning-AI/pytorch-lightning/pull/20055))
- NumPy is no longer a required dependency ([20090](https://github.com/Lightning-AI/pytorch-lightning/issues/20090))

</details>

<details open><summary>Removed</summary>

- Removed support for PyTorch 2.1 ([20009](https://github.com/Lightning-AI/lightning/pull/20009))
- Removed support for Python 3.8 ([20071](https://github.com/Lightning-AI/lightning/pull/20071))

</details>

<details open><summary>Fixed</summary>

- Avoid LightningCLI saving hyperparameters with `class_path` and `init_args` since this would be a breaking change ([20068](https://github.com/Lightning-AI/pytorch-lightning/pull/20068))
- Fixed an issue that would cause too many printouts of the seed info when using `seed_everything()` ([20108](https://github.com/Lightning-AI/pytorch-lightning/pull/20108))
- Fixed `_LoggerConnector`'s `_ResultMetric` to move all registered keys to the device of the logged value if needed ([19814](https://github.com/Lightning-AI/pytorch-lightning/issues/19814))
- Fixed `_optimizer_to_device` logic for special 'step' key in optimizer state causing performance regression ([20019](https://github.com/Lightning-AI/lightning/pull/20019))
- Fixed parameter counts in `ModelSummary` when model has distributed parameters (DTensor) ([20163](https://github.com/Lightning-AI/pytorch-lightning/pull/20163))

</details>

<a name="changelog-fabric"></a>
Lightning Fabric

<details open><summary>Added</summary>

- Made saving non-distributed checkpoints fully atomic ([20011](https://github.com/Lightning-AI/pytorch-lightning/pull/20011))
- Added a flag `verbose` to the `seed_everything()` function ([20108](https://github.com/Lightning-AI/pytorch-lightning/pull/20108))
- Added support for PyTorch 2.4 ([20028](https://github.com/Lightning-AI/pytorch-lightning/pull/20028))
- Added support for Python 3.12 ([20078](https://github.com/Lightning-AI/pytorch-lightning/pull/20078))

</details>

<details open><summary>Changed</summary>

- Changed the implementation of how seeds are chosen for dataloader workers when using `seed_everything(..., workers=True)` ([20055](https://github.com/Lightning-AI/pytorch-lightning/pull/20055))
- NumPy is no longer a required dependency ([20090](https://github.com/Lightning-AI/pytorch-lightning/issues/20090))

</details>

<details open><summary>Removed</summary>

- Removed support for PyTorch 2.1 ([20009](https://github.com/Lightning-AI/lightning/pull/20009))
- Removed support for Python 3.8 ([20071](https://github.com/Lightning-AI/lightning/pull/20071))

</details>

<details open><summary>Fixed</summary>

- Fixed an attribute error when loading a checkpoint into a quantized model using the `_lazy_load()` function ([20121](https://github.com/Lightning-AI/lightning/pull/20121))
- Fixed `_optimizer_to_device` logic for special 'step' key in optimizer state causing performance regression ([20019](https://github.com/Lightning-AI/lightning/pull/20019))

</details>

</br>

**Full commit list**: [2.3.0 -> 2.4.0](https://github.com/Lightning-AI/pytorch-lightning/compare/2.3.0...2.4.0)

<a name="contributors"></a>
Contributors

We thank all our contributors who submitted pull requests for features, bug fixes and documentation updates.

New Contributors
* SamuelLarkin made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19969
* liambsmith made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19986
* EtayLivne made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19915
* elmuz made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19998
* swyo made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19982
* corwinjoy made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/20011
* omahs made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19979
* linbo0518 made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/20040
* 01AbhiSingh made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/20055
* K-H-Ismail made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/20099
* adosar made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/20146
* jojje made their first contribution in https://github.com/Lightning-AI/pytorch-lightning/pull/19578

Did you know?

Chuck Norris can solve NP-hard problems in polynomial time. In fact, any problem is easy when Chuck Norris solves it.

2.3.3

This release removes the code from the main `lightning` package that was reported in [CVE-2024-5980](https://github.com/advisories/GHSA-mr7h-w2qc-ffc2).

2.3.2

Not secure

Includes a minor bugfix that avoids a conflict with the entrypoint command with another package [20041](https://github.com/Lightning-AI/pytorch-lightning/pull/20041).

2.3.1

Not secure

Includes minor bugfixes and stability improvements.

**Full Changelog**: https://github.com/Lightning-AI/pytorch-lightning/compare/2.3.0...2.3.1

2.3.0

Not secure

[Lightning AI](https://lightning.ai) is excited to announce the release of Lightning 2.3 :zap:

**Did you know?** The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you [Lightning Studio](https://lightning.ai/). Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.

This release introduces experimental support for Tensor Parallelism and 2D Parallelism, [PyTorch 2.3](https://pytorch.org/blog/pytorch2-3/) support, and several bugfixes and stability improvements.

- [Highlights](highlights)
- [Tensor Parallelism (beta)](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#highlights-tensor-parallel)
- [2D Parallelism (beta)](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#highlights-2d-parallel)
- [Training Mode in Model Summary](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#highlights-model-summary)
- [Special Forward Methods in Fabric](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#highlights-forward-methods)
- [Notable Changes](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#bc-changes)
- [Full Changelog](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#changelog)
- [PyTorch Lightning](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#changelog-pytorch)
- [Lightning Fabric](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#changelog-fabric)
- [Contributors](https://github.com/Lightning-AI/lightning/releases/tag/2.3.0#contributors)

<a name="highlights"></a>
Highlights

<a name="highlights-tensor-parallel"></a>
Tensor Parallelism (beta)

Tensor parallelism (TP) is a technique that splits up the computation of selected layers across GPUs to save memory and speed up distributed models. To enable TP as well as other forms of parallelism, we introduce a `ModelParallelStrategy` for both Lightning Trainer and Fabric. Under the hood, TP is enabled through new experimental PyTorch APIs like [DTensor](https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md) and [`torch.distributed.tensor.parallel`](https://pytorch.org/docs/stable/distributed.tensor.parallel.html).

PyTorch Lightning

Enabling TP in a model with PyTorch Lightning requires you to implement the `LightningModule.configure_model()` method where you convert selected layers of a model to paralellized layers. This is an advanced feature, because it requires a deep understanding of the model architecture. Open the [tutorial Studio](https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning) to learn the basics of Tensor Parallelism.

<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
</a>

 

python
import lightning as L
from lightning.pytorch.strategies import ModelParallelStrategy
from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel
from torch.distributed.tensor.parallel import parallelize_module

1. Implement the `configure_model()` method in LightningModule
class LitModel(L.LightningModule):
def __init__(self):
super().__init__()
self.model = FeedForward(8192, 8192)

def configure_model(self):
Lightning will set up a `self.device_mesh` for you
tp_mesh = self.device_mesh["tensor_parallel"]
Use PyTorch's distributed tensor APIs to parallelize the model
plan = {
"w1": ColwiseParallel(),
"w2": RowwiseParallel(),
"w3": ColwiseParallel(),
}
parallelize_module(self.model, tp_mesh, plan)

def training_step(self, batch):
...

2. Create the strategy
strategy = ModelParallelStrategy()

3. Configure devices and set the strategy in Trainer
trainer = L.Trainer(accelerator="cuda", devices=2, strategy=strategy)
trainer.fit(...)

<details><summary>Full training example (requires at least 2 GPUs).</summary>

python
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel
from torch.distributed.tensor.parallel import parallelize_module

import lightning as L
from lightning.pytorch.demos.boring_classes import RandomDataset
from lightning.pytorch.strategies import ModelParallelStrategy

class FeedForward(nn.Module):
def __init__(self, dim, hidden_dim):
super().__init__()
self.w1 = nn.Linear(dim, hidden_dim, bias=False)
self.w2 = nn.Linear(hidden_dim, dim, bias=False)
self.w3 = nn.Linear(dim, hidden_dim, bias=False)

def forward(self, x):
return self.w2(F.silu(self.w1(x)) * self.w3(x))

class LitModel(L.LightningModule):
def __init__(self):
super().__init__()
self.model = FeedForward(8192, 8192)

def configure_model(self):
if self.device_mesh is None:
return

Lightning will set up a `self.device_mesh` for you
tp_mesh = self.device_mesh["tensor_parallel"]
Use PyTorch's distributed tensor APIs to parallelize the model
plan = {
"w1": ColwiseParallel(),
"w2": RowwiseParallel(),
"w3": ColwiseParallel(),
}
parallelize_module(self.model, tp_mesh, plan)

def training_step(self, batch):
output = self.model(batch)
loss = output.sum()
return loss

def configure_optimizers(self):
return torch.optim.AdamW(self.model.parameters(), lr=3e-3)

def train_dataloader(self):
Trainer configures the sampler automatically for you such that
all batches in a tensor-parallel group are identical
dataset = RandomDataset(8192, 64)
return torch.utils.data.DataLoader(dataset, batch_size=8, num_workers=2)

strategy = ModelParallelStrategy()
trainer = L.Trainer(
accelerator="cuda",
devices=2,
strategy=strategy,
max_epochs=1,
)

model = LitModel()
trainer.fit(model)

trainer.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")

</details>

</br>

Lightning Fabric

Applying TP in a model with Fabric requires you to implement a special function where you convert selected layers of a model to paralellized layers. This is an advanced feature, because it requires a deep understanding of the model architecture. Open the [tutorial Studio](https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric) to learn the basics of Tensor Parallelism.

<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
</a>

 

python
import lightning as L
from lightning.fabric.strategies import ModelParallelStrategy
from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel
from torch.distributed.tensor.parallel import parallelize_module

1. Implement the parallelization function for your model
def parallelize_feedforward(model, device_mesh):
Lightning will set up a device mesh for you
tp_mesh = device_mesh["tensor_parallel"]
Use PyTorch's distributed tensor APIs to parallelize the model
plan = {
"w1": ColwiseParallel(),
"w2": RowwiseParallel(),
"w3": ColwiseParallel(),
}
parallelize_module(model, tp_mesh, plan)
return model

2. Pass the parallelization function to the strategy
strategy = ModelParallelStrategy(parallelize_fn=parallelize_feedforward)

3. Configure devices and set the strategy in Fabric
fabric = L.Fabric(accelerator="cuda", devices=2, strategy=strategy)
fabric.launch()

<details><summary>Full training example (requires at least 2 GPUs).</summary>

python
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel
from torch.distributed.tensor.parallel import parallelize_module

import lightning as L
from lightning.pytorch.demos.boring_classes import RandomDataset
from lightning.fabric.strategies import ModelParallelStrategy

class FeedForward(nn.Module):
def __init__(self, dim, hidden_dim):
super().__init__()
self.w1 = nn.Linear(dim, hidden_dim, bias=False)
self.w2 = nn.Linear(hidden_dim, dim, bias=False)
self.w3 = nn.Linear(dim, hidden_dim, bias=False)

def forward(self, x):
return self.w2(F.silu(self.w1(x)) * self.w3(x))

def parallelize_feedforward(model, device_mesh):
Lightning will set up a device mesh for you
tp_mesh = device_mesh["tensor_parallel"]
Use PyTorch's distributed tensor APIs to parallelize the model
plan = {
"w1": ColwiseParallel(),
"w2": RowwiseParallel(),
"w3": ColwiseParallel(),
}
parallelize_module(model, tp_mesh, plan)
return model

strategy = ModelParallelStrategy(parallelize_fn=parallelize_feedforward)
fabric = L.Fabric(accelerator="cuda", devices=2, strategy=strategy)
fabric.launch()

Initialize the model
model = FeedForward(8192, 8192)
model = fabric.setup(model)

Define the optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-3)
optimizer = fabric.setup_optimizers(optimizer)

Define dataset/dataloader
dataset = RandomDataset(8192, 64)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
dataloader = fabric.setup_dataloaders(dataloader)

Simplified training loop
for i, batch in enumerate(dataloader):
output = model(batch)
loss = output.sum()
fabric.backward(loss)
optimizer.step()
optimizer.zero_grad()
fabric.print(f"Iteration {i} complete")

fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")

</details>

</br>

<a name="highlights-2d-parallel"></a>
2D Parallelism (beta)

Tensor Parallelism by itself can be very effective for efficient inference of very large models. For training, TP is typically combined with other forms of parallelism, such as FSDP, to increase throughput and scalability on large clusters with 100s of GPUs. The new `ModelParallelStrategy` in this release supports the combination of TP + FSDP, which is referred to as 2D parallelism.

For an introduction to this feature, please also refer to the tutorial Studios ([PyTorch Lightning](https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning), [Lightning Fabric](https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric)). At the moment, the PyTorch team is reimplementing FSDP under the name [FSDP2](https://github.com/pytorch/pytorch/issues/114299) with the aim to make it compose well with other parallelisms such as TP. Therefore, for the experimental 2D parallelism support, you'll need to switch to using FSDP2 with the new `ModelParallelStrategy`. Please refer to our docs ([PyTorch Lightning](https://lightning.ai/docs/pytorch/latest/advanced/model_parallel/tp_fsdp.html), [Lightning Fabric](https://lightning.ai/docs/fabric/latest/advanced/model_parallel/tp_fsdp.html)) and stay tuned for future releases as these APIs mature.

<a name="highlights-model-summary"></a>
Training Mode in Model Summary

The model summary table that gets displayed when you run `Trainer.fit()` now contains a new column "Mode" that shows the training mode each layer is in ([19468](https://github.com/Lightning-AI/lightning/pull/19468)).

| Name | Type | Params | Mode
-----------------------------------------------------------------
0 | model | Sam | 93.7 M | train
1 | model.image_encoder | ImageEncoderViT | 89.7 M | eval
2 | model.prompt_encoder | PromptEncoder | 6.2 K | train
3 | model.mask_decoder | MaskDecoder | 4.1 M | train
-----------------------------------------------------------------

2.2.5

Not secure

PyTorch Lightning + Fabric

Fixed

- Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) ([19886](https://github.com/Lightning-AI/lightning/pull/19886))

----

**Full Changelog**: https://github.com/Lightning-AI/pytorch-lightning/compare/2.2.4...2.2.5

Page 2 of 32

Releases

Has known vulnerabilities

Previous Next

Lightning

Page 2 of 32

2.4.0

2.3.3

2.3.2

2.3.1

2.3.0

2.2.5

Page 2 of 32

Links

Releases