App
Added
- Added a possibility to set up basic authentication for Lightning apps (16105)
Changed
- The LoadBalancer now uses internal ip + port instead of URL exposed (16119)
- Added support for logging in different trainer stages with `DeviceStatsMonitor`
(16002)
- Changed `lightning_app.components.serve.gradio` to `lightning_app.components.serve.gradio_server` (16201)
- Made cluster creation/deletion async by default (16185)
Fixed
- Fixed not being able to run multiple lightning apps locally due to port collision (15819)
- Avoid `relpath` bug on Windows (16164)
- Avoid using the deprecated `LooseVersion` (16162)
- Porting fixes to autoscaler component (16249)
- Fixed a bug where `lightning login` with env variables would not correctly save the credentials (16339)
---
Fabric
Added
- Added `Fabric.launch()` to programmatically launch processes (e.g. in Jupyter notebook) (14992)
- Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the `run` method (14992)
- Added `Fabric.setup_module()` and `Fabric.setup_optimizers()` to support strategies that need to set up the model before an optimizer can be created (15185)
- Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (14967)
- Added `lightning_fabric.accelerators.find_usable_cuda_devices` utility function (16147)
- Added basic support for LightningModules (16048)
- Added support for managing callbacks via `Fabric(callbacks=...)` and emitting events through `Fabric.call()` (16074)
- Added Logger support (16121)
* Added `Fabric(loggers=...)` to support different Logger frameworks in Fabric
* Added `Fabric.log` for logging scalars using multiple loggers
* Added `Fabric.log_dict` for logging a dictionary of multiple metrics at once
* Added `Fabric.loggers` and `Fabric.logger` attributes to access the individual logger instances
* Added support for calling `self.log` and `self.log_dict` in a LightningModule when using Fabric
* Added access to `self.logger` and `self.loggers` in a LightningModule when using Fabric
- Added `lightning_fabric.loggers.TensorBoardLogger` (16121)
- Added `lightning_fabric.loggers.CSVLogger` (16346)
- Added support for a consistent `.zero_grad(set_to_none=...)` on the wrapped optimizer regardless of which strategy is used (16275)
Changed
- Renamed the class `LightningLite` to `Fabric` (15932, 15938)
- The `Fabric.run()` method is no longer abstract (14992)
- The `XLAStrategy` now inherits from `ParallelStrategy` instead of `DDPSpawnStrategy` (15838)
- Merged the implementation of `DDPSpawnStrategy` into `DDPStrategy` and removed `DDPSpawnStrategy` (14952)
- The dataloader wrapper returned from `.setup_dataloaders()` now calls `.set_epoch()` on the distributed sampler if one is used (16101)
- Renamed `Strategy.reduce` to `Strategy.all_reduce` in all strategies (16370)
- When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (16388)
Removed
- Removed support for FairScale's sharded training (`strategy='ddp_sharded'|'ddp_sharded_spawn'`). Use Fully-Sharded Data Parallel instead (`strategy='fsdp'`) (16329)
Fixed
- Restored sampling parity between PyTorch and Fabric dataloaders when using the `DistributedSampler` (16101)
- Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (16334)
---
PyTorch
Added
- Added support for native logging of `MetricCollection` with enabled compute groups (15580)
- Added support for custom artifact names in `pl.loggers.WandbLogger` (16173)
- Added support for DDP with `LRFinder` (15304)
- Added utilities to migrate checkpoints from one Lightning version to another (15237)
- Added support to upgrade all checkpoints in a folder using the `pl.utilities.upgrade_checkpoint` script (15333)
- Add an axes argument `ax` to the `.lr_find().plot()` to enable writing to a user-defined axes in a matplotlib figure (15652)
- Added `log_model` parameter to `MLFlowLogger` (9187)
- Added a check to validate that wrapped FSDP models are used while initializing optimizers (15301)
- Added a warning when `self.log(..., logger=True)` is called without a configured logger (15814)
- Added support for colossalai 0.1.11 (15888)
- Added `LightningCLI` support for optimizer and learning schedulers via callable type dependency injection (15869)
- Added support for activation checkpointing for the `DDPFullyShardedNativeStrategy` strategy (15826)
- Added the option to set `DDPFullyShardedNativeStrategy(cpu_offload=True|False)` via bool instead of needing to pass a configuration object (15832)
- Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (16037)
- Added support for returning optimizer-like classes in `LightningModule.configure_optimizers` (16189)
Changed
- Switch from `tensorboard` to `tensorboardx` in `TensorBoardLogger` (15728)
- From now on, Lightning Trainer and `LightningModule.load_from_checkpoint` automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (15237)
- `Trainer.{validate,test,predict}(ckpt_path=...)` no longer restores the `Trainer.global_step` and `trainer.current_epoch` value from the checkpoints - From now on, only `Trainer.fit` will restore this value (15532)
- The `ModelCheckpoint.save_on_train_epoch_end` attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (15300)
- The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (15634)
- `MLFlowLogger` now logs hyperparameters and metrics in batched API calls (15915)
- Overriding the `on_train_batch_{start,end}` hooks in conjunction with taking a `dataloader_iter` in the `training_step` no longer errors out and instead shows a warning (16062)
- Move `tensorboardX` to extra dependencies. Use the `CSVLogger` by default (16349)
- Drop PyTorch 1.9 support (15347)
Deprecated
- Deprecated `description`, `env_prefix` and `env_parse` parameters in `LightningCLI.__init__` in favour of giving them through `parser_kwargs` (15651)
- Deprecated `pytorch_lightning.profiler` in favor of `pytorch_lightning.profilers` (16059)
- Deprecated `Trainer(auto_select_gpus=...)` in favor of `pytorch_lightning.accelerators.find_usable_cuda_devices` (16147)
- Deprecated `pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus}` in favor of `pytorch_lightning.accelerators.find_usable_cuda_devices` (16147)
- `nvidia/apex` deprecation (16039)
* Deprecated `pytorch_lightning.plugins.NativeMixedPrecisionPlugin` in favor of `pytorch_lightning.plugins.MixedPrecisionPlugin`
* Deprecated the `LightningModule.optimizer_step(using_native_amp=...)` argument
* Deprecated the `Trainer(amp_backend=...)` argument
* Deprecated the `Trainer.amp_backend` property
* Deprecated the `Trainer(amp_level=...)` argument
* Deprecated the `pytorch_lightning.plugins.ApexMixedPrecisionPlugin` class
* Deprecates the `pytorch_lightning.utilities.enums.AMPType` enum
* Deprecates the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
- `horovod` deprecation (16141)
* Deprecated `Trainer(strategy="horovod")`
* Deprecated the `HorovodStrategy` class
- Deprecated `pytorch_lightning.lite.LightningLite` in favor of `lightning.fabric.Fabric` (16314)
- `FairScale` deprecation (in favor of PyTorch's FSDP implementation) (16353)
* Deprecated the `pytorch_lightning.overrides.fairscale.LightningShardedDataParallel` class
* Deprecated the `pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin` class
* Deprecated the `pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin` class
* Deprecated the `pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy` class
* Deprecated the `pytorch_lightning.strategies.sharded.DDPShardedStrategy` class
* Deprecated the `pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy` class
Removed
- Removed deprecated `pytorch_lightning.utilities.memory.get_gpu_memory_map` in favor of `pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats` (15617)
- Temporarily removed support for Hydra multi-run (15737)
- Removed deprecated `pytorch_lightning.profiler.base.AbstractProfiler` in favor of `pytorch_lightning.profilers.profiler.Profiler` (15637)
- Removed deprecated `pytorch_lightning.profiler.base.BaseProfiler` in favor of `pytorch_lightning.profilers.profiler.Profiler` (15637)
- Removed deprecated code in `pytorch_lightning.utilities.meta` (16038)
- Removed the deprecated `LightningDeepSpeedModule` (16041)
- Removed the deprecated `pytorch_lightning.accelerators.GPUAccelerator` in favor of `pytorch_lightning.accelerators.CUDAAccelerator` (16050)
- Removed the deprecated `pytorch_lightning.profiler.*` classes in favor of `pytorch_lightning.profilers` (16059)
- Removed the deprecated `pytorch_lightning.utilities.cli` module in favor of `pytorch_lightning.cli` (16116)
- Removed the deprecated `pytorch_lightning.loggers.base` module in favor of `pytorch_lightning.loggers.logger` (16120)
- Removed the deprecated `pytorch_lightning.loops.base` module in favor of `pytorch_lightning.loops.loop` (16142)
- Removed the deprecated `pytorch_lightning.core.lightning` module in favor of `pytorch_lightning.core.module` (16318)
- Removed the deprecated `pytorch_lightning.callbacks.base` module in favor of `pytorch_lightning.callbacks.callback` (16319)
- Removed the deprecated `Trainer.reset_train_val_dataloaders()` in favor of `Trainer.reset_{train,val}_dataloader` (16131)
- Removed support for `LightningCLI(seed_everything_default=None)` (16131)
- Removed support in LightningLite for FairScale's sharded training (`strategy='ddp_sharded'|'ddp_sharded_spawn'`). Use Fully-Sharded Data Parallel instead (`strategy='fsdp'`) (16329)
Fixed
- Enhanced `reduce_boolean_decision` to accommodate `any`-analogous semantics expected by the `EarlyStopping` callback (15253)
- Fixed the incorrect optimizer step synchronization when running across multiple TPU devices (16020)
- Fixed a type error when dividing the chunk size in the ColossalAI strategy (16212)
- Fixed bug where the ``interval`` key of the scheduler would be ignored during manual optimization, making the LearningRateMonitor callback fail to log the learning rate (16308)
- Fixed an issue with `MLFlowLogger` not finalizing correctly when status code 'finished' was passed (16340)
---
Contributors
1SAA, akihironitta, AlessioQuercia, awaelchli, bipinKrishnan, Borda, carmocca, dmitsf, erhoo82, ethanwharris, Forbu, hhsecond, justusschock, lantiga, lightningforever, Liyang90, manangoel99, mauvilsa, nicolai86, nohalon, rohitgr7, schmidt-jake, speediedan, yMayanand
_If we forgot someone due to not matching commit email with GitHub account, let us know :]_