Lightning

Latest version: v2.3.0

Safety actively analyzes 638741 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 27

1.8.0.post1

Not secure
What's Changed
* Implement freeze batchnorm with freezing track running stats by PososikTeam in https://github.com/Lightning-AI/lightning/pull/15063
* Pkg: fix parsing versions by Borda in https://github.com/Lightning-AI/lightning/pull/15401
* Remove pytest as a requirement to run app by manskx in https://github.com/Lightning-AI/lightning/pull/15449

New Contributors
* PososikTeam made their first contribution in https://github.com/Lightning-AI/lightning/pull/15063

**Full Changelog**: https://github.com/Lightning-AI/lightning/compare/1.8.0...1.8.0.post1

1.8.0

Not secure
The core team is excited to announce the release of Lightning 1.8 :zap:

- [Highlights](highlights)
- [Backward Incompatible Changes](bc-changes)
- [Deprecations](deprecations)
- [Full Changelog](changelog)
- [Contributors](contributors)

Lightning v1.8 is the culmination of work from 52 contributors who have worked on features, bug-fixes, and documentation for a total of over 550+ commits since v1.7.

<a name="highlights"></a>
Highlights

Colossal-AI

[Colossal-AI](https://github.com/hpcaitech/ColossalAI) focuses on improving efficiency when training large-scale AI models with billions of parameters. With the new Colossal-AI strategy in Lightning 1.8, you can train existing models like GPT-3 with up to half as many GPUs as usually needed. You can also train models up to twice as big with the same number of GPUs, saving you significant cost. Here is how you use it:


python
Select the strategy with good defaults
trainer = Trainer(strategy="colossalai")

or tune parameters to your liking
from lightning.pytorch.strategies import ColossalAIStrategy

trainer = Trainer(strategy=ColossalAIStrategy(placement_policy="cpu", ...))


You can find Colossal-AI's benchmarks with Lightning on GPT-2 [here](https://github.com/hpcaitech/ColossalAI-Pytorch-lightning/tree/main/benchmark).

Under the hood, Colossal-AI implements different parallelism algorithms that are especially interesting for the development of SOTA transformer models:

- Data Parallelism
- Pipeline Parallelism
- 1D, 2D, 2.5D, 3D Tensor Parallelism
- Sequence Parallelism
- Zero Redundancy Optimization


Learn how to install and use Colossal-AI effectively with Lightning [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html?highlight=colossal-ai#colossal-ai).


**NOTE:** This strategy is marked as **experimental**. Stay tuned for more updates in the future.


Secrets for Lightning Apps

Introducing encrypted secrets ([14612](https://github.com/Lightning-AI/lightning/pull/14612)), a feature requested by Lightning App users :tada:!

Encrypted secrets allow you to securely pass private data to your apps, like API keys, access tokens, database passwords, or other credentials, without exposing them in your code.

1. Add a secret to your Lightning account in lightning.ai (read more [here](https://lightning.ai/lightning-docs/glossary/secrets.html))
2. Add an environment variable to your app to read the secret:

python
somewhere in your Flow or Work:
GitHubComponent(api_token=os.environ["API_TOKEN"])


3. Pass the secret to your app run with the following command:

bash
lightning run app app.py --cloud --secret API_TOKEN=github_api_token


These secrets are encrypted and stored in the Lightning database. Nothing except your app can access the value.

**NOTE:** This is an **experimental** feature.


CLI Commands for Lightning Apps

Introducing CLI commands for apps ([13602](https://github.com/Lightning-AI/lightning/pull/13602))!
As a Lightning App builder, if you want to easily create a CLI interface for users to interract with your app, then this is for you.

Here is an example where users can dynamically create notebooks from the CLI.
All you need to do is implement the `configure_commands` hook on the `LightningFlow`:

python
import lightning as L
from commands.notebook.run import RunNotebook


class Flow(L.LightningFlow):
...

def configure_commands(self):
Return a list of dictionaries with commands:
return [{"run notebook": RunNotebook(method=self.run_notebook)}]


app = L.LightningApp(Flow())


Once the app is running with `lightning run app app.py`, you can connect to the app with the following command:

bash
lightning connect {app name} -y


and run the command that was configured:

bash
lightning run notebook --name=my_notebook_name

<a style="visibility:hidden">For a full tutorial and running example, visit our docs. TODO: add to docs</a>
**NOTE:** This is an **experimental** feature.


Auto-wrapping for FSDP Strategy

In Lightning v1.7, we introduced an integration for PyTorch FSDP in the form of our FSDP strategy, which allows you to train huge models with billions of parameters sharded across hundreds of GPUs and machines.

python
Native FSDP implementation
trainer = Trainer(strategy="fsdp_native")


We are continuing to improve the support for this feature by adding automatic wrapping of layers for use cases where the model fits into CPU memory, but not into GPU memory ([14383](https://github.com/Lightning-AI/lightning/issues/14383)).

Here are some examples:


**Case 1:** Model is so large that it does not fit into CPU memory.
Construct your layers in the `configure_sharded_model` hook and wrap the large ones you want to shard across GPUs:

python
class MassiveModel(LightningModule):
...

Create model here and wrap the large layers for sharding
def configure_sharded_model(self):
for i, layer in enumerate(self.block):
self.block[i] = wrap(layer)
...


**Case 2:** Model fits into CPU memory, but not into GPU memory. In Lightning v1.8, you no longer need to do anything special here, as we can automatically wrap the layers for you using FSDP's policy:

python
model = MassiveModel()
trainer = Trainer(
accelerator="gpu",
devices=8,
strategy="fsdp_native", or strategy="fsdp" for fairscale
precision=16
)

Automatically wraps the layers here:
trainer.fit(model)



**Case 3:** Model fits into GPU memory. No action required, use any strategy you want.


**Note:** if you want to manually wrap layers for more control, you can still do that!

Read more about FSDP and how layer wrapping works in our [docs](https://pytorch-lightning.readthedocs.io/en/latest/advanced/model_parallel.html#pytorch-fully-sharded-training).


New Tuner Callbacks

In this release, we focused on Tuner improvements and introduced two new callbacks that can help you customize the batch size finder and learning rate finder as per your use case.

Batch Size Finder ([11089](https://github.com/PyTorchLightning/pytorch-lightning/pull/11089))

1. You can customize the `BatchSizeFinder` callback to run at different epochs. This feature is useful while fine-tuning models since you can't always use the same batch size after unfreezing the backbone.

python
from lightning.pytorch.callbacks import BatchSizeFinder


class FineTuneBatchSizeFinder(BatchSizeFinder):
def __init__(self, milestones, *args, **kwargs):
super().__init__(*args, **kwargs)
self.milestones = milestones

def on_fit_start(self, *args, **kwargs):
return

def on_train_epoch_start(self, trainer, pl_module):
if trainer.current_epoch in self.milestones or trainer.current_epoch == 0:
self.scale_batch_size(trainer, pl_module)


trainer = Trainer(callbacks=[FineTuneBatchSizeFinder(milestones=(5, 10))])
trainer.fit(...)


2. Run batch size finder for `validate`/`test`/`predict`.

python
from lightning.pytorch.callbacks import BatchSizeFinder


class EvalBatchSizeFinder(BatchSizeFinder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

def on_fit_start(self, *args, **kwargs):
return

def on_test_start(self, trainer, pl_module):
self.scale_batch_size(trainer, pl_module)


trainer = Trainer(callbacks=[EvalBatchSizeFinder()])
trainer.test(...)


Learning Rate Finder ([13802](https://github.com/PyTorchLightning/pytorch-lightning/pull/13802))

You can now use the `LearningRateFinder` callback to run at different intervals. This feature is useful when fine-tuning models, for example.

python
from lightning.pytorch.callbacks import LearningRateFinder


class FineTuneLearningRateFinder(LearningRateFinder):
def __init__(self, milestones, *args, **kwargs):
super().__init__(*args, **kwargs)
self.milestones = milestones

def on_fit_start(self, *args, **kwargs):
return

def on_train_epoch_start(self, trainer, pl_module):
if trainer.current_epoch in self.milestones or trainer.current_epoch == 0:
self.lr_find(trainer, pl_module)

trainer = Trainer(callbacks=[FineTuneLearningRateFinder(milestones=(5, 10))])
trainer.fit(...)



LightningCLI Improvements


Even though the `LightningCLI` class is designed to help in the implementation of command line tools, there are instances when it might be more desirable to run directly from Python. In Lightning 1.8, you can now do this ([14596](https://github.com/Lightning-AI/lightning/pull/14596)):

python
from lightning.pytorch.cli import LightningCLI

def cli_main(args):
cli = LightningCLI(MyModel, ..., args=args)
...


Anywhere in your program, you can now call the CLI directly:

python
cli_main(["--trainer.max_epochs=100", "--model.encoder_layers=24"])


[Learn about all features of the LightningCLI!](https://pytorch-lightning.readthedocs.io/en/stable/cli/lightning_cli.html)


Improvements to the SLURM Support

Multi-node training on a SLURM cluster has been supported since the inception of Lightning Trainer, and has seen several improvements over time thanks to many community contributions. And we just keep going! In this release, we've added two quality of life improvements:

- The preemption/termination signal is now configurable ([14626](https://github.com/Lightning-AI/lightning/pull/14626)):


python
the default signal is SIGUSR1
trainer = Trainer(plugins=[SLURMEnvironment(requeue_signal=signal.SIGUSR1)])

customize it for your cluster
trainer = Trainer(plugins=[SLURMEnvironment(requeue_signal=signal.SIGHUP)])


- Automatic requeuing of jobs now also works for array jobs ([15040](https://github.com/Lightning-AI/lightning/pull/15040))! Array jobs are a convenient way to group/launch several scripts at once. When the SLURM scheduler interrupts your jobs, Lightning will save a checkpoint, resubmit a new job, and, once the scheduler allocates resources, the Trainer will resume from where it left off.


Read more about our SLURM integration [here](https://pytorch-lightning.readthedocs.io/en/stable/clouds/cluster_advanced.html).


<a name="bc-changes"></a>
Backward Incompatible Changes

This section outlines notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the CHANGELOG below.

Callback hooks for loading and saving checkpoints

The signature and behavior of the `on_load_checkpoint` and `on_save_checkpoint` callback hooks have changed ([14835](https://github.com/Lightning-AI/lightning/pull/14835)):

Before:
python
def on_save_checkpoint(self, trainer, pl_module, checkpoint):
...
previously, we were able to return state here
return state

def on_load_checkpoint(self, trainer, pl_module, callback_state):
previously, only the state for this callback was passed in as argument
...



Now:
python
def on_save_checkpoint(self, trainer, pl_module, checkpoint):
...
returning a value here is no longer supported
you can modify the checkpoint dict directly
return None


def state_dict(self):
...
Now, return state from this new method
return state


def on_load_checkpoint(self, trainer, pl_module, checkpoint):
previously, only the state for this callback was passed in as argument
...


def load_state_dict(self, state):
Now, the state for this callback gets passed to this new method
...



DataModule hooks for loading and saving checkpoints

The `on_save_checkpoint` and `on_load_checkpoint` hooks on the `LightningDataModule` have been removed in favor of the `state_dict` and `load_state_dict` methods:

diff
-def on_save_checkpoint(self, checkpoint):
- checkpoint["banana"] = self.banana
+def state_dict(self):
+ return dict(banana=self.banana)


-def on_load_checkpoint(self, checkpoint):
- self.banana = checkpoint["banana"]
+def load_state_dict(self, state):
+ self.banana = state["banana"]



Callback hooks

We removed some `Callback` hooks that were ambiguous to use Removed deprecated callback hooks ([14834](https://github.com/Lightning-AI/lightning/pull/14834)):


| Old name | New name |
|------------------------------|--------------------------------|
| `on_batch_start` | `on_train_batch_start` |
| `on_batch_end` | `on_train_batch_end` |
| `on_epoch_start` | `on_train_epoch_start` |
| `on_epoch_start` | `on_validation_epoch_start` |
| `on_epoch_start` | `on_test_epoch_start` |
| `on_pretrain_routine_start` | `on_fit_start` |



Trainer Device Attributes

We cleaned up the properties related to device indices ([14829](https://github.com/Lightning-AI/lightning/pull/14829)).

The attributes `Trainer.{devices,gpus,num_gpus,ipus,tpu_cores,num_processes,root_gpu,data_parallel_device_ids}` have been removed in favor of accelerator-agnostic attributes:

python
trainer = Trainer(...)

access the number of devices the trainer uses on this machine ...
print(trainer.num_devices)

... or the device IDs
print(trainer.device_ids)




Setting the torch-distributed backend

In previous versions of Lightning, switching between the "gloo" and "nccl" backends for multi-GPU, multi-node training was possible through setting an environment variable like so:

bash
PL_TORCH_DISTRIBUTED_BACKEND="gloo" python train.py


But not all strategies support changing the backend in this way.
From now on, the backend has to be set in the code ([14693](https://github.com/Lightning-AI/lightning/pull/14693)):

python
trainer = Trainer(strategy=DDPStrategy(process_group_backend="gloo"))


The default remains "nccl", and you should choose "gloo" only for debugging purposes.




Logging with multiple loggers

Logging with multiple loggers can be super useful (and super easy with Lightning). For example, you could be using one logger to record sensitive image logs to a hosted MLFlow server within your organization, and at the same time log loss curves online to WandB.

python
trainer = Trainer(
loggers=[WandbLogger(...), MLFlowLogger(...)]
)


Here are two major changes that apply when using multiple loggers in 1.8:

- Checkpoints and profiler reports no longer go to a strange folder with a long, hard to remember name ([14325](https://github.com/Lightning-AI/lightning/pull/14325)). From now on, these arifacts will land in the version folder of the **first** logger in the list.

- The loggers used to be wrapped by a `LoggerCollection` object, so that when you accessed `trainer.logger` you could log to all of them simultaneously. However, this "magic" caused confusion and errors among users and we decided to simplify this ([14283](https://github.com/Lightning-AI/lightning/pull/14283)):

python
now returns the first logger in the list
print(trainer.logger)

access all loggers in a list with plural
loggers = trainer.loggers

for logger in loggers:
logger.do_something()



<a name="deprecations"></a>
Deprecations

**Why is Lightning deprecating APIs in every release?**

Many users have this question, and it is a fair one! Deprecations are a normal part of API evolution in all software. We continually improve Lightning, which means we make APIs like class names, methods, hooks and arguments clear, easy to remember, and general enough to adopt more functionality in the future. Sometimes we have to let old things go to build new and better products.

Learn more about our deprecation window [here](https://pytorch-lightning.readthedocs.io/en/stable/governance.html#api-evolution).

So far, we have followed the pattern of removing deprecated functionality and APIs after two minor versions of deprecation. From Lightning 1.8 onward, we will additionaly convert warnings to error messages after the deprecation phase ends. This way, we can greatly improve the upgrade experience with helpful messages for users who skip more than two minor Lightning versions. The exception to this rule are experimental features, which are marked as such in our documentation.

Here is a summary of major deprecations introduced in 1.8:

| API | Removal version | Alternative |
|--------------------------------------------------------------------------------------------------------------------------|-----------------|-------------------------------------------------|
| Argument `Trainer(amp_level=...)` | 1.10 | `Trainer(plugins=[ApexMixedPrecisionPlugin(amp_level=...)])` |
| Function `unwrap_lightning_module` | 1.10 | `Strategy.lightning_module` |
| Function `unwrap_lightning_module_sharded` | 1.10 | `Strategy.lightning_module` |
| Import `pl.core.mixins.DeviceDtypeModuleMixin` | 1.10 | No longer supported |
| Argument `LightningCLI(save_config_filename=...)` | 1.10 | `LightningCLI(save_config_kwargs=dict(config_filename=...))` |
| Argument `LightningCLI(save_config_overwrite=...)` | 1.10 | `LightningCLI(save_config_kwargs=dict(overwrite=...))` |
| Argument `LightningCLI(save_config_multifile=...)` | 1.10 | `LightningCLI(save_config_kwargs=dict(multifile=...))` |
| Enum `TrainerFn.TUNING` | 1.10 | No longer supported |
| Enum `RunningStage.TUNING` | 1.10 | No longer supported |
| Attribute `Trainer.tuning` | 1.10 | No longer supported |


<a name="changelog"></a>
CHANGELOG

Lightning App

<details><summary>Added</summary>

- Added `load_state_dict` and `state_dict` hooks for `LightningFlow` components ([14100](https://github.com/Lightning-AI/lightning/pull/14100))
- Added a `--secret` option to CLI to allow binding secrets to app environment variables when running in the cloud ([14612](https://github.com/Lightning-AI/lightning/pull/14612))
- Added support for running the works without cloud compute in the default container ([14819](https://github.com/Lightning-AI/lightning/pull/14819))
- Added an HTTPQueue as an optional replacement for the default redis queue ([14978](https://github.com/Lightning-AI/lightning/pull/14978)
- Added support for configuring flow cloud compute ([14831](https://github.com/Lightning-AI/lightning/pull/14831))
- Added support for adding descriptions to commands either through a docstring or the `DESCRIPTION` attribute ([15193](https://github.com/Lightning-AI/lightning/pull/15193)
- Added a try / catch mechanism around request processing to avoid killing the flow ([15187](https://github.com/Lightning-AI/lightning/pull/15187)
- Added an Database Component ([14995](https://github.com/Lightning-AI/lightning/pull/14995)
- Added authentication to HTTP queue ([15202](https://github.com/Lightning-AI/lightning/pull/15202))
- Added support to pass a `LightningWork` to the `LightningApp` ([15215](https://github.com/Lightning-AI/lightning/pull/15215)
- Added support getting CLI help for connected apps even if the app isn't running ([15196](https://github.com/Lightning-AI/lightning/pull/15196)
- Added support for adding requirements to commands and installing them when missing when running an app command ([15198](https://github.com/Lightning-AI/lightning/pull/15198)
- Added Lightning CLI Connection to be terminal session instead of global ([15241](https://github.com/Lightning-AI/lightning/pull/15241)
- Added support for managing SSH-keys via CLI ([15291](https://github.com/Lightning-AI/lightning/pull/15291))
- Add a `JustPyFrontend` to ease UI creation with `https://github.com/justpy-org/justpy` ([#15002](https://github.com/Lightning-AI/lightning/pull/15002))
- Added a layout endpoint to the Rest API and enable to disable pulling or pushing to the state ([15367](https://github.com/Lightning-AI/lightning/pull/15367)
- Added support for functions for `configure_api` and `configure_commands` to be executed in the Rest API process ([15098](https://github.com/Lightning-AI/lightning/pull/15098)
- Added support to start lightning app on cloud without needing to install dependencies locally ([15019](https://github.com/Lightning-AI/lightning/pull/15019)

</details>


<details><summary>Changed</summary>

- Improved the show logs command to be standalone and re-usable ([15343](https://github.com/Lightning-AI/lightning/pull/15343)
- Removed the `--instance-types` option when creating clusters ([15314](https://github.com/Lightning-AI/lightning/pull/15314))

</details>

<details><summary>Fixed</summary>


- Fixed an issue when using the CLI without arguments ([14877](https://github.com/Lightning-AI/lightning/pull/14877))
- Fixed a bug where the upload files endpoint would raise an error when running locally ([14924](https://github.com/Lightning-AI/lightning/pull/14924))
- Fixed BYOC cluster region selector -> hiding it from help since only us-east-1 has been tested and is recommended ([15277]https://github.com/Lightning-AI/lightning/pull/15277)
- Fixed a bug when launching an app on multiple clusters ([15226](https://github.com/Lightning-AI/lightning/pull/15226))
- Fixed a bug with a default CloudCompute for Lightning flows ([15371](https://github.com/Lightning-AI/lightning/pull/15371))


</details>


Lightning Trainer

<details><summary>Added</summary>

- Added support for requeueing slurm array jobs ([15040](https://github.com/Lightning-AI/lightning/pull/15040))
- Added native AMP support for `ddp_fork` (and associated alias strategies) with CUDA GPUs ([14983](https://github.com/Lightning-AI/lightning/pull/14983))
- Added `BatchSizeFinder` callback ([11089](https://github.com/Lightning-AI/lightning/pull/11089))
- Added `LearningRateFinder` callback ([13802](https://github.com/Lightning-AI/lightning/pull/13802))
- Tuner now supports a new `method` argument which will determine when to run the `BatchSizeFinder`: one of `fit`, `validate`, `test` or `predict` ([11089](https://github.com/Lightning-AI/lightning/pull/11089))
- Added prefix to log message in `seed_everything` with rank info ([14031](https://github.com/Lightning-AI/lightning/pull/14031))
- Added support for auto wrapping for `DDPFullyShardedNativeStrategy` ([14252](https://github.com/Lightning-AI/lightning/pull/14252))
- Added support for passing extra init-parameters to the `LightningDataModule.from_datasets` ([14185](https://github.com/Lightning-AI/lightning/pull/14185))
- Added support for saving sharded optimizer state dict outside of `DDPShardedStrategy` ([14208](https://github.com/Lightning-AI/lightning/pull/14208))
- Added support for auto wrapping for `DDPFullyShardedStrategy` ([14383](https://github.com/Lightning-AI/lightning/pull/14383))
- Integrate the `lightning_utilities` package (
[14475](https://github.com/Lightning-AI/lightning/pull/14475),
[14537](https://github.com/Lightning-AI/lightning/pull/14537),
[14556](https://github.com/Lightning-AI/lightning/pull/14556),
[14558](https://github.com/Lightning-AI/lightning/pull/14558),
[14575](https://github.com/Lightning-AI/lightning/pull/14575),
[14620](https://github.com/Lightning-AI/lightning/pull/14620))
- Added `args` parameter to `LightningCLI` to ease running from within Python ([14596](https://github.com/Lightning-AI/lightning/pull/14596))
- Added `WandbLogger.download_artifact` and `WandbLogger.use_artifact` for managing artifacts with Weights and Biases ([14551](https://github.com/Lightning-AI/lightning/pull/14551))
- Added an option to configure the signal SLURM sends when a job is preempted or requeued ([14626](https://github.com/Lightning-AI/lightning/pull/14626))
- Added a warning when the model passed to `LightningLite.setup()` does not have all parameters on the same device ([14822](https://github.com/Lightning-AI/lightning/pull/14822))
- The `CometLogger` now flags the Comet Experiments as being created from Lightning for analytics purposes ([14906](https://github.com/Lightning-AI/lightning/pull/14906))
- Introduce `ckpt_path="hpc"` keyword for checkpoint loading ([14911](https://github.,com/Lightning-AI/lightning/pull/14911))
- Added a more descriptive error message when attempting to fork processes with pre-initialized CUDA context ([14709](https://github.com/Lightning-AI/lightning/pull/14709))
- Added support for custom parameters in subclasses of `SaveConfigCallback` ([14998](https://github.com/Lightning-AI/lightning/pull/14998))
- Added `inference_mode` flag to Trainer to let users enable/disable inference mode during evaluation ([15034](https://github.com/Lightning-AI/lightning/pull/15034))
- Added `LightningLite.no_backward_sync` for control over efficient gradient accumulation with distributed strategies ([14966](https://github.com/Lightning-AI/lightning/pull/14966))
- Added a sanity check that scripts are executed with the `srun` command in SLURM and that environment variables are not conflicting ([15011](https://github.com/Lightning-AI/lightning/pull/15011))
- Added an error message when attempting to launch processes with `python -i` and an interactive-incompatible strategy ([15293](https://github.com/Lightning-AI/lightning/pull/15293))


</details>

<details><summary>Changed</summary>

- The `Trainer.{fit,validate,test,predict,tune}` methods now raise a useful error message if the input is not a `LightningModule` ([13892](https://github.com/Lightning-AI/lightning/pull/13892))
- Raised a `MisconfigurationException` if batch transfer hooks are overriden with `IPUAccelerator` ([13961](https://github.com/Lightning-AI/lightning/pull/13961))
- Replaced the unwrapping logic in strategies with direct access to unwrapped `LightningModule` ([13738](https://github.com/Lightning-AI/lightning/pull/13738))
- Enabled `on_before_batch_transfer` for `DPStrategy` and `IPUAccelerator` ([14023](https://github.com/Lightning-AI/lightning/pull/14023))
- When resuming training with Apex enabled, the `Trainer` will now raise an error ([14341](https://github.com/Lightning-AI/lightning/pull/14341))
- Included `torch.cuda` rng state to the aggregate `_collect_rng_states()` and `_set_rng_states()` ([14384](https://github.com/Lightning-AI/lightning/pull/14384))
- Changed `trainer.should_stop` to not stop in between an epoch and run until `min_steps/min_epochs` only ([13890](https://github.com/Lightning-AI/lightning/pull/13890))
- The `pyDeprecate` dependency is no longer installed ([14472](https://github.com/Lightning-AI/lightning/pull/14472))
- When using multiple loggers, by default checkpoints and profiler output now get saved to the log dir of the first logger in the list ([14325](https://github.com/Lightning-AI/lightning/pull/14325))
- In Lightning Lite, state-dict access to the module wrapper now gets passed through to the original module reference ([14629](https://github.com/Lightning-AI/lightning/pull/14629))
- Removed fall-back to `LightningEnvironment` when number of SLURM tasks does not correspond to number of processes in Trainer ([14300](https://github.com/Lightning-AI/lightning/pull/14300))
- Aligned DDP and DDPSpawn strategies in setting up the environment ([11073](https://github.com/Lightning-AI/lightning/pull/11073))
- Integrated the Lite Precision plugins into the PL Precision plugins - the base class in PL now extends the `lightning_lite.precision.Precision` base class ([14798](https://github.com/Lightning-AI/lightning/pull/14798))
* The `PrecisionPlugin.backward` signature changed: The `closure_loss` argument was renamed to `tensor`
* The `PrecisionPlugin.{pre_,post_}backward` signature changed: The `closure_loss` argument was renamed to `tensor` and moved as the first argument
* The `PrecisionPlugin.optimizer_step` signature changed: The `model`, `optimizer_idx` and `closure` arguments need to be passed as keyword arguments now
- Trainer queries the CUDA devices through NVML if available to avoid initializing CUDA before forking, which eliminates the need for the `PL_DISABLE_FORK` environment variable introduced in v1.7.4 ([14631](https://github.com/Lightning-AI/lightning/pull/14631))
- The `MLFlowLogger.finalize()` now sets the status to `FAILED` when an exception occurred in `Trainer`, and sets the status to `FINISHED` on successful completion ([12292](https://github.com/Lightning-AI/lightning/pull/12292))
- It is no longer needed to call `model.double()` when using `precision=64` in Lightning Lite ([14827](https://github.com/Lightning-AI/lightning/pull/14827))
- HPC checkpoints are now loaded automatically only in slurm environment when no specific value for `ckpt_path` has been set ([14911](https://github.com/Lightning-AI/lightning/pull/14911))
- The `Callback.on_load_checkpoint` now gets the full checkpoint dictionary and the `callback_state` argument was renamed `checkpoint` ([14835](https://github.com/Lightning-AI/lightning/pull/14835))
- Moved the warning about saving nn.Module in `save_hyperparameters()` to before the deepcopy ([15132](https://github.com/Lightning-AI/lightning/pull/15132))
- To avoid issues with forking processes, from PyTorch 1.13 and higher, Lightning will directly use the PyTorch NVML-based check for `torch.cuda.device_count` and from PyTorch 1.14 and higher, Lightning will configure PyTorch to use a NVML-based check for `torch.cuda.is_available`. ([15110](https://github.com/Lightning-AI/lightning/pull/15110), [#15133](https://github.com/Lightning-AI/lightning/pull/15133))
- The `NeptuneLogger` now uses `neptune.init_run` instead of the deprecated `neptune.init` to initialize a run ([15393](https://github.com/Lightning-AI/lightning/pull/15393))


</details>

<details><summary>Deprecated</summary>


- Deprecated `LightningDeepSpeedModule` ([14000](https://github.com/Lightning-AI/lightning/pull/14000))
- Deprecated `amp_level` from `Trainer` in favour of passing it explictly via precision plugin ([13898](https://github.com/Lightning-AI/lightning/pull/13898))
- Deprecated the calls to `pytorch_lightning.utiltiies.meta` functions in favor of built-in https://github.com/pytorch/torchdistx support ([#13868](https://github.com/Lightning-AI/lightning/pull/13868))
- Deprecated the `unwrap_lightning_module` and `unwrap_lightning_module_sharded` utility functions in favor of accessing the unwrapped `LightningModule` on the strategy directly ([13738](https://github.com/Lightning-AI/lightning/pull/13738))
- Deprecated the `pl_module` argument in `LightningParallelModule`, `LightningDistributedModule`, `LightningShardedDataParallel`, `LightningBaguaModule` and `LightningDeepSpeedModule` wrapper classes ([13738](https://github.com/Lightning-AI/lightning/pull/13738))
- Deprecated the `on_colab_kaggle` function ([14247](https://github.com/Lightning-AI/lightning/pull/14247))
- Deprecated the internal `pl.core.mixins.DeviceDtypeModuleMixin` class ([14511](https://github.com/Lightning-AI/lightning/pull/14511), [#14548](https://github.com/Lightning-AI/lightning/pull/14548))
- Deprecated all functions in `pytorch_lightning.utilities.xla_device` ([14514](https://github.com/Lightning-AI/lightning/pull/14514), [#14550](https://github.com/Lightning-AI/lightning/pull/14550))
* Deprecated the internal `inner_f` function
* Deprecated the internal `pl_multi_process` function
* Deprecated the internal `XLADeviceUtils.xla_available` staticmethod
* Deprecated the `XLADeviceUtils.tpu_device_exists` staticmethod in favor of `pytorch_lightning.accelerators.TPUAccelerator.is_available()`
- Deprecated `pytorch_lightning.utilities.distributed.tpu_distributed` in favor of `lightning_lite.accelerators.tpu.tpu_distributed` ([14550](https://github.com/Lightning-AI/lightning/pull/14550))
- Deprecated all functions in `pytorch_lightning.utilities.cloud_io` in favor of `lightning_lite.utilities.cloud_io` ([14515](https://github.com/Lightning-AI/lightning/pull/14515))
- Deprecated the functions in `pytorch_lightning.utilities.apply_func` in favor of `lightning_utilities.core.apply_func` ([14516](https://github.com/Lightning-AI/lightning/pull/14516), [#14537](https://github.com/Lightning-AI/lightning/pull/14537))
- Deprecated all functions in `pytorch_lightning.utilities.device_parser` ([14492](https://github.com/Lightning-AI/lightning/pull/14492), [#14753](https://github.com/Lightning-AI/lightning/pull/14753))
* Deprecated the `pytorch_lightning.utilities.device_parser.determine_root_gpu_device` in favor of `lightning_lite.utilities.device_parser.determine_root_gpu_device`
* Deprecated the `pytorch_lightning.utilities.device_parser.parse_gpu_ids` in favor of `lightning_lite.utilities.device_parser.parse_gpu_ids`
* Deprecated the `pytorch_lightning.utilities.device_parser.is_cuda_available` in favor of `lightning_lite.accelerators.cuda.is_cuda_available`
* Deprecated the `pytorch_lightning.utilities.device_parser.num_cuda_devices` in favor of `lightning_lite.accelerators.cuda.num_cuda_devices`
* Deprecated the `pytorch_lightning.utilities.device_parser.parse_cpu_cores` in favor of `lightning_lite.accelerators.cpu.parse_cpu_cores`
* Deprecated the `pytorch_lightning.utilities.device_parser.parse_tpu_cores` in favor of `lightning_lite.accelerators.tpu.parse_tpu_cores`
* Deprecated the `pytorch_lightning.utilities.device_parser.parse_hpus` in favor of `pytorch_lightning.accelerators.hpu.parse_hpus`
- Deprecated duplicate `SaveConfigCallback` parameters in `LightningCLI.__init__`: `save_config_kwargs`, `save_config_overwrite` and `save_config_multifile`. New `save_config_kwargs` parameter should be used instead ([14998](https://github.com/Lightning-AI/lightning/pull/14998))
- Deprecated `TrainerFn.TUNING`, `RunningStage.TUNING` and `trainer.tuning` property ([15100](https://github.com/Lightning-AI/lightning/pull/15100))
- Deprecated custom `pl.utilities.distributed.AllGatherGrad` implementation in favor of PyTorch's ([15364](https://github.com/Lightnign-AI/lightning/pull/15364))


</details>

<details><summary>Removed</summary>

- Removed the deprecated `Trainer.training_type_plugin` property in favor of `Trainer.strategy` ([14011](https://github.com/Lightning-AI/lightning/pull/14011))
- Removed all deprecated training type plugins ([14011](https://github.com/Lightning-AI/lightning/pull/14011))
- Removed the deprecated `DDP2Strategy` ([14026](https://github.com/Lightning-AI/lightning/pull/14026))
- Removed the deprecated `DistributedType` and `DeviceType` enum classes ([14045](https://github.com/Lightning-AI/lightning/pull/14045))
- Removed deprecated support for passing the `rank_zero_warn` warning category positionally ([14470](https://github.com/Lightning-AI/lightning/pull/14470))
- Removed the legacy and unused `Trainer.get_deprecated_arg_names()` ([14415](https://github.com/Lightning-AI/lightning/pull/14415))
- Removed the deprecated `on_train_batch_end(outputs)` format when multiple optimizers are used and TBPTT is enabled ([14373](https://github.com/Lightning-AI/lightning/pull/14373))
- Removed the deprecated `training_epoch_end(outputs)` format when multiple optimizers are used and TBPTT is enabled ([14373](https://github.com/Lightning-AI/lightning/pull/14373))
- Removed the experimental `pytorch_lightning.utiltiies.meta` functions in favor of built-in https://github.com/pytorch/torchdistx support ([#13868](https://github.com/Lightning-AI/lightning/pull/13868))
- Removed the deprecated `LoggerCollection`; `Trainer.logger` and `LightningModule.logger` now returns the first logger when more than one gets passed to the Trainer ([14283](https://github.com/Lightning-AI/lightning/pull/14283))
- Removed the deprecated the `trainer.lr_schedulers` ([14408](https://github.com/Lightning-AI/lightning/pull/14408))
- Removed the deprecated `LightningModule.{on_hpc_load,on_hpc_save}` hooks in favor of the general purpose hooks `LightningModule.{on_load_checkpoint,on_save_checkpoint}` ([14315](https://github.com/Lightning-AI/lightning/pull/14315))
- Removed deprecated support for old torchtext versions ([14375](https://github.com/Lightning-AI/lightning/pull/14375))
- Removed deprecated support for the old `neptune-client` API in the `NeptuneLogger` ([14727](https://github.com/Lightning-AI/lightning/pull/14727))
- Removed the deprecated `weights_save_path` Trainer argumnent and `Trainer.weights_save_path` property ([14424](https://github.com/Lightning-AI/lightning/pull/14424))
- Removed the deprecated ([14471](https://github.com/Lightning-AI/lightning/pull/14471))
* `pytorch_lightning.utilities.distributed.rank_zero_only` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_only`
* `pytorch_lightning.utilities.distributed.rank_zero_debug` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_debug`
* `pytorch_lightning.utilities.distributed.rank_zero_info` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_info`
* `pytorch_lightning.utilities.warnings.rank_zero_warn` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_warn`
* `pytorch_lightning.utilities.warnings.rank_zero_deprecation` in favor of `pytorch_lightning.utilities.rank_zero.rank_zero_deprecation`
* `pytorch_lightning.utilities.warnings.LightningDeprecationWarning` in favor of `pytorch_lightning.utilities.rank_zero.LightningDeprecationWarning`
- Removed deprecated `Trainer.num_processes` attribute in favour of `Trainer.num_devices` ([14423](https://github.com/Lightning-AI/lightning/pull/14423))
- Removed the deprecated `Trainer.data_parallel_device_ids` hook in favour of `Trainer.device_ids` ([14422](https://github.com/Lightning-AI/lightning/pull/14422))
- Removed the deprecated class `TrainerCallbackHookMixin` ([14401](https://github.com/Lightning-AI/lightning/14401))
- Removed the deprecated `BaseProfiler` and `AbstractProfiler` classes ([14404](https://github.com/Lightning-AI/lightning/pull/14404))
- Removed the deprecated way to set the distributed backend via the environment variable `PL_TORCH_DISTRIBUTED_BACKEND`, in favor of setting the `process_group_backend` in the strategy constructor ([14693](https://github.com/Lightning-AI/lightning/pull/14693))
- Removed deprecated callback hooks ([14834](https://github.com/Lightning-AI/lightning/pull/14834))
* `Callback.on_configure_sharded_model` in favor of `Callback.setup`
* `Callback.on_before_accelerator_backend_setup` in favor of `Callback.setup`
* `Callback.on_batch_start` in favor of `Callback.on_train_batch_start`
* `Callback.on_batch_end` in favor of `Callback.on_train_batch_end`
* `Callback.on_epoch_start` in favor of `Callback.on_{train,validation,test}_epoch_start`
* `Callback.on_epoch_end` in favor of `Callback.on_{train,validation,test}_epoch_end`
* `Callback.on_pretrain_routine_{start,end}` in favor of `Callback.on_fit_start`
- Removed the deprecated device attributes `Trainer.{devices,gpus,num_gpus,ipus,tpu_cores}` in favor of the accelerator-agnostic `Trainer.num_devices` ([14829](https://github.com/Lightning-AI/lightning/pull/14829))
- Removed the deprecated `LightningIPUModule` ([14830](https://github.com/Lightning-AI/lightning/pull/14830))
- Removed the deprecated `Logger.agg_and_log_metrics` hook in favour of `Logger.log_metrics` and the `agg_key_funcs` and `agg_default_func` arguments. ([14840](https://github.com/Lightning-AI/lightning/pull/14840))
- Removed the deprecated precision plugin checkpoint hooks `PrecisionPlugin.on_load_checkpoint` and `PrecisionPlugin.on_save_checkpoint` ([14833](https://github.com/Lightning-AI/lightning/pull/14833))
- Removed the deprecated `Trainer.root_gpu` attribute in favor of `Trainer.strategy.root_device` ([14829](https://github.com/Lightning-AI/lightning/pull/14829))
- Removed the deprecated `Trainer.use_amp` and `LightningModule.use_amp` attributes ([14832](https://github.com/Lightning-AI/lightning/pull/14832))
- Removed the deprecated callback hooks `Callback.on_init_start` and `Callback.on_init_end` ([14867](https://github.com/Lightning-AI/lightning/pull/14867))
- Removed the deprecated `Trainer.run_stage` in favor of `Trainer.{fit,validate,test,predict}` ([14870](https://github.com/Lightning-AI/lightning/pull/14870))
- Removed the deprecated `SimpleProfiler.profile_iterable` and `AdvancedProfiler.profile_iterable` attributes ([14864](https://github.com/Lightning-AI/lightning/pull/14864))
- Removed the deprecated `Trainer.verbose_evaluate` ([14884](https://github.com/Lightning-AI/lightning/pull/14884))
- Removed the deprecated `Trainer.should_rank_save_checkpoint` ([14885](https://github.com/Lightning-AI/lightning/pull/14885))
- Removed the deprecated `TrainerOptimizersMixin` ([14887](https://github.com/Lightning-AI/lightning/pull/14887))
- Removed the deprecated `Trainer.lightning_optimizers` ([14889](https://github.com/Lightning-AI/lightning/pull/14889))
- Removed the deprecated `TrainerDataLoadingMixin` ([14888](https://github.com/Lightning-AI/lightning/pull/14888))
- Removed the deprecated `Trainer.call_hook` in favor of `Trainer._call_callback_hooks`, `Trainer._call_lightning_module_hook`, `Trainer._call_ttp_hook`, and `Trainer._call_accelerator_hook` ([14869](https://github.com/Lightning-AI/lightning/pull/14869))
- Removed the deprecated `Trainer.{validated,tested,predicted}_ckpt_path` ([14897](https://github.com/Lightning-AI/lightning/pull/14897))
- Removed the deprecated `device_stats_monitor_prefix_metric_keys` ([14890](https://github.com/Lightning-AI/lightning/pull/14890))
- Removed the deprecated `LightningDataModule.on_save/load_checkpoint` hooks ([14909](https://github.com/Lightning-AI/lightning/pull/14909))
- Removed support for returning a value in `Callback.on_save_checkpoint` in favor of implementing `Callback.state_dict` ([14835](https://github.com/Lightning-AI/lightning/pull/14835))


</details>

<details><summary>Fixed</summary>


- Fixed an issue with `LightningLite.setup()` not setting the `.device` attribute correctly on the returned wrapper ([14822](https://github.com/Lightning-AI/lightning/pull/14822))
- Fixed an attribute error when running the tuner together with the `StochasticWeightAveraging` callback ([14836](https://github.com/Lightning-AI/lightning/pull/14836))
- Fixed MissingFieldException in offline mode for the `NeptuneLogger()` ([14919](https://github.com/Lightning-AI/lightning/pull/14919))
- Fixed wandb `save_dir` is overridden by `None` `dir` when using CLI ([14878](https://github.com/Lightning-AI/lightning/pull/14878))
- Fixed a missing call to `LightningDataModule.load_state_dict` hook while restoring checkpoint using `LightningDataModule.load_from_checkpoint` ([14883](https://github.com/Lightning-AI/lightning/pull/14883))
- Fixed torchscript error with containers of LightningModules ([14904](https://github.com/Lightning-AI/lightning/pull/14904))
- Fixed reloading of the last checkpoint on run restart ([14907](https://github.com/Lightning-AI/lightning/pull/14907))
- `SaveConfigCallback` instances should only save the config once to allow having the `overwrite=False` safeguard when using `LightningCLI(..., run=False)` ([14927](https://github.com/Lightning-AI/lightning/pull/14927))
- Fixed an issue with terminating the trainer profiler when a `StopIteration` exception is raised while using an `IterableDataset` ([14940](https://github.com/Lightning-AI/lightning/pull/14945))
- Do not update on-plateau schedulers when reloading from an end-of-epoch checkpoint ([14702](https://github.com/Lightning-AI/lightning/pull/14702))
- Fixed `Trainer` support for PyTorch built without distributed support ([14971](https://github.com/Lightning-AI/lightning/pull/14971))
- Fixed batch normalization statistics calculation in `StochasticWeightAveraging` callback ([14866](https://github.com/Lightning-AI/lightning/pull/14866))
- Avoided initializing optimizers during deepspeed inference ([14944](https://github.com/Lightning-AI/lightning/pull/14944))
- Fixed `LightningCLI` parse_env and description in subcommands ([15138](https://github.com/Lightning-AI/lightning/pull/15138))
- Fixed an exception that would occur when creating a `multiprocessing.Pool` after importing Lightning ([15292](https://github.com/Lightning-AI/lightning/pull/15292))
- Fixed a pickling error when using `RichProgressBar` together with checkpointing ([15319](https://github.com/Lightning-AI/lightning/pull/15319))
- Fixed the `RichProgressBar` crashing when used with distributed strategies ([15376](https://github.com/Lightning-AI/lightning/pull/15376))
- Fixed an issue with `RichProgressBar` not resetting the internal state for the sanity check progress ([15377](https://github.com/Lightning-AI/lightning/pull/15377))
- Fixed an issue with DataLoader re-instantiation when the attribute is an array and the default value of the corresponding argument changed ([15409](https://github.com/Lightning-AI/lightning/pull/15409))


</details>

**Full commit list**: https://github.com/PyTorchLightning/pytorch-lightning/compare/1.7.0...1.8.0

<a name="contributors"></a>
Contributors

Veteran

akihironitta ananthsub AndresAlgaba ar90n Atharva-Phatak awaelchli BongYang Borda carmocca dependabot donlapark ethanwharris Felonious-Spellfire hhsecond jerome-habana JustinGoheen justusschock kaushikb11 krishnakalyan3 krshrimali luca-medeiros manangoel99 manskx mauvilsa MrShevan nicolai86 nmiculinic otaj Queuecumber rlizzo rohitgr7 rschireman SeanNaren speediedan tchaton tshu-w


New

Birch-san clementpoiret HalestormAI thongonary alecmerdler adam-lightning yurijmikhalevich lijm1358 robert-s-lee panos-is kacperlukawski alro923 dmitsf Anner-deJong cschell nishantb06 Callidior j0rd1smit MarcSkovMadsen KralaBenjamin robertomest daniel347x pierocor datumbox nohalon pritamsoni-hsr nandwalritik gilfree ritsuki1227 christopher-nguyen-re JulesGM jgbos dconathan jsr-p NeoKish Blaizzy suyash-811 alexkuzmik ziyadsheeba geoffrey-g-delhomme amrutha1098 AlessioQuercia ver217 Helias zxvix 1SAA fabiofumarola luca3rd kimpty PaulLerner rbracco wouterzwerink


_If we forgot somebody or you have a suggestion, find [support here](https://www.pytorchlightning.ai/support) :zap:_

Did you know?

Chuck Norris can write functions of infinite recursion ... and have them return.

1.7.7

Fixed

- Fixed the availability check for the neptune-client package ([14714](https://github.com/Lightning-AI/lightning/pull/14714))
- Break HPU Graphs into two parts (forward + backward as one and optimizer as another) for better performance ([14656](https://github.com/Lightning-AI/lightning/pull/14656))
- Fixed torchscript error with ensembles of LightningModules ([14657](https://github.com/Lightning-AI/lightning/pull/14657), [#14724](https://github.com/Lightning-AI/lightning/pull/14724))
- Fixed an issue with `TensorBoardLogger.finalize` creating a new experiment when none was created during the Trainer's execution ([14762](https://github.com/Lightning-AI/lightning/pull/14762))
- Fixed `TypeError` on import when `torch.distributed` is not available ([14809](https://github.com/Lightning-AI/lightning/pull/14809))

Contributors

awaelchli Borda carmocca dependabot otaj raoakarsha

_If we forgot someone due to not matching commit email with GitHub account, let us know_ :)

1.7.6

Changed

- Improved the error messaging when passing `Trainer.method(model, x_dataloader=None)` with no module-method implementations available ([14614](https://github.com/Lightning-AI/lightning/pull/14614))

Fixed

- Reset the dataloaders on OOM failure in batch size finder to use the last successful batch size ([14372](https://github.com/Lightning-AI/lightning/pull/14372))
- Fixed an issue to keep downscaling the batch size in case there hasn't been even a single successful optimal batch size with `mode="power"` ([14372](https://github.com/Lightning-AI/lightning/pull/14372))
- Fixed an issue where `self.log`-ing a tensor would create a user warning from PyTorch about cloning tensors ([14599](https://github.com/Lightning-AI/lightning/pull/14599))
- Fixed compatibility when `torch.distributed` is not available ([14454](https://github.com/Lightning-AI/lightning/pull/14454))

Contributors

akihironitta awaelchli Borda carmocca dependabot krshrimali mauvilsa pierocor rohitgr7 wangraying

_If we forgot someone due to not matching commit email with GitHub account, let us know_ :)

1.7.5

Fixed

- Squeezed tensor values when logging with `LightningModule.log` ([14489](https://github.com/Lightning-AI/lightning/pull/14489))
- Fixed `WandbLogger` `save_dir` is not set after creation ([14326](https://github.com/Lightning-AI/lightning/pull/14326))
- Fixed `Trainer.estimated_stepping_batches` when maximum number of epochs is not set ([14317](https://github.com/Lightning-AI/lightning/pull/14317))

Contributors

carmocca dependabot robertomest rohitgr7 tshu-w

_If we forgot someone due to not matching commit email with GitHub account, let us know_ :)

1.7.4

Added

- Added an environment variable `PL_DISABLE_FORK` that can be used to disable all forking in the Trainer ([14319](https://github.com/Lightning-AI/lightning/issues/14319))

Fixed

- Fixed `LightningDataModule` hparams parsing ([12806](https://github.com/PyTorchLightning/pytorch-lightning/pull/12806))
- Reset epoch progress with batch size scaler ([13846](https://github.com/Lightning-AI/lightning/pull/13846))
- Fixed restoring the trainer after using `lr_find()` so that the correct LR schedule is used for the actual training ([14113](https://github.com/Lightning-AI/lightning/pull/14113))
- Fixed incorrect values after transferring data to an MPS device ([14368](https://github.com/Lightning-AI/lightning/pull/14368))

Contributors

rohitgr7 tanmoyio justusschock cschell carmocca Callidior awaelchli j0rd1smit dependabot Borda otaj

Page 9 of 27

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.