Lightning

Latest version: v2.5.1

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 25 of 32

0.7.5

We made a few changes to Callbacks to test ops on detached GPU tensors to avoid CPU transfer. However, it made callbacks unpicklable which will crash DDP.

This release fixes that core issue

Changed

- Allow logging of metrics together with hparams (1630)
- Allow metrics logged together with hparams (1630)

Removed

- Removed Warning from trainer loop (1634)

Fixed

- Fixed ModelCheckpoint not being fixable (1632)
- Fixed CPU DDP breaking change and DDP change (1635)
- Tested pickling (1636)

Contributors

justusschock, quinor, williamFalcon

0.7.4

Key updates

- PyTorch 1.5 support
- Added Horovod distributed_backend option
- Enable forward compatibility with the native AMP (PyTorch 1.6).
- Support 8-core TPU on Kaggle
- Added ability to customize progress_bar via Callbacks
- Speed/memory optimizations.
- Improved Argparse usability with Trainer
- Docs improvements
- Tons of bug fixes

Detail changes

Added

- Added flag `replace_sampler_ddp` to manually disaple sampler replacement in ddp (1513)
- Added speed parity tests (max 1 sec difference per epoch)(1482)
- Added `auto_select_gpus` flag to trainer that enables automatic selection of available GPUs on exclusive mode systems.
- Added learining rate finder (1347)
- Added support for ddp mode in clusters without SLURM (1387)
- Added `test_dataloaders` parameter to `Trainer.test()` (1434)
- Added `terminate_on_nan` flag to trainer that performs a NaN check with each training iteration when set to `True` (1475)
- Added speed parity tests (max 1 sec difference per epoch)(1482)
- Added `terminate_on_nan` flag to trainer that performs a NaN check with each training iteration when set to `True`. (1475)
- Added `ddp_cpu` backend for testing ddp without GPUs (1158)
- Added [Horovod](http://horovod.ai) support as a distributed backend `Trainer(distributed_backend='horovod')` (#1529)
- Added support for 8 core distributed training on Kaggle TPU's (1568)
- Added support for native AMP (1561, [1580)

Changed

- Changed the default behaviour to no longer include a NaN check with each training iteration. (1475)
- Decoupled the progress bar from trainer. It is a callback now and can be customized or even be replaced entirely (1450).
- Changed lr schedule step interval behavior to update every backwards pass instead of every forwards pass (1477)
- Defines shared proc. rank, remove rank from instances (e.g. loggers) (1408)
- Updated semantic segmentation example with custom u-net and logging (1371)
- Disabled val and test shuffling (1600)

Deprecated

- Deprecated `training_tqdm_dict` in favor of `progress_bar_dict` (1450).

Removed

- Removed `test_dataloaders` parameter from `Trainer.fit()` (1434)

Fixed

- Added the possibility to pass nested metrics dictionaries to loggers (1582)
- Fixed memory leak from opt return (1528)
- Fixed saving checkpoint before deleting old ones (1453)
- Fixed loggers - flushing last logged metrics even before continue, e.g. `trainer.test()` results (1459)
- Fixed optimizer configuration when `configure_optimizers` returns dict without `lr_scheduler` (1443)
- Fixed `LightningModule` - mixing hparams and arguments in `LightningModule.__init__()` crashes load_from_checkpoint() (1505)
- Added a missing call to the `on_before_zero_grad` model hook (1493).
- Allow use of sweeps with WandbLogger (1512)
- Fixed a bug that caused the `callbacks` Trainer argument to reference a global variable (1534).
- Fixed a bug that set all boolean CLI arguments from Trainer.add_argparse_args always to True (1571)
- Fixed do not copy the batch when training on a single GPU (1576, [1579)
- Fixed soft checkpoint removing on DDP (1408)
- Fixed automatic parser bug (1585)
- Fixed bool conversion from string (1606)

Contributors

alexeykarnachev, areshytko, awaelchli, Borda, borisdayma, ethanwharris, fschlatt, HenryJia, Ir1d, justusschock, karlinjf, lezwon, neggert, rmrao, rohitgr7, SkafteNicki, tgaddair, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.7.3

We had a few (subtle) bugs that affected DDP and a few key things in 0.7.2 so we released 0.7.3 to fix them because they are critical for DDP. sorry about that! still, no API changes, but please do skip straight to 0.7.3 upgrade for those fixes

Detail changes

Added

- Added `rank_zero_warn` for warning only in rank 0 (1428)

Fixed

- Fixed default `DistributedSampler` for DDP training (1425)
- Fixed workers warning not on windows (1430)
- Fixed returning tuple from `run_training_batch` (1431)
- Fixed gradient clipping (1438)
- Fixed pretty print (1441)

Contributors

alsrgv, Borda, williamFalcon

0.7.2

Overview

This release aims at fixing particular issues and improving the user development experience via extending docs, adding typing and supporting python 3.8. In particular, some of the release highlights are:
- Added benchmark for comparing lightning with vanilla implementations
- Extended optimizer support with particular frequency
- Several improvements for loggers such as represent no-primitive types, supporting hierarchical dictionaries for hyper param searchers
- Added model configuration checking before it runs
- Simplify the PL examples structure (shallower and more readable)
- Improved Trainer CLI arguments handling (generalization)
- Two Trainer argument become deprecated: `print_nan_grads` and `show_progress_bar`

Detail changes

Added

- Added same step loggers' metrics aggregation (1278)
- Added parity test between a vanilla MNIST model and lightning model (1284)
- Added parity test between a vanilla RNN model and lightning model (1351)
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example (1232)
- Added support for hierarchical `dict` (1152)
- Added `TrainsLogger` class (1122)
- Added type hints to `pytorch_lightning.core` (946)
- Added support for `IterableDataset` in validation and testing (1104)
- Added support for non-primitive types in `hparams` for `TensorboardLogger` (1130)
- Added a check that stops the training when loss or weights contain `NaN` or `inf` values. (1097)
- Added support for `IterableDataset` when `val_check_interval=1.0` (default), this will trigger validation at the end of each epoch. (1283)
- Added `summary` method to Profilers. (1259)
- Added informative errors if user defined dataloader has zero length (1280)
- Added testing for python 3.8 (915)
- Added a `training_epoch_end` method which is the mirror of `validation_epoch_end`. (1357)
- Added model configuration checking (1199)
- Added support for optimizer frequencies through `LightningModule.configure_optimizers()` (1269)
- Added option to run without an optimizer by returning `None` from `configure_optimizers`. (1279)
- Added a warning when the number of data loader workers is small. (1378)

Changed

- Changed (renamed and refactored) `TensorRunningMean` -> `TensorRunningAccum`: running accumulations were generalized. (1278)
- Changed `progress_bar_refresh_rate` trainer flag to disable progress bar when setting to 0. (1108)
- Enhanced `load_from_checkpoint` to also forward params to the model (1307)
- Updated references to self.forward() to instead use the `__call__` interface. (1211)
- Changed default behaviour of `configure_optimizers` to use no optimizer rather than Adam. (1279)
- Allow uploading models on W&B (1339)
- On DP and DDP2 unsqueeze is automated now (1319)
- Did not always create a DataLoader during reinstantiation, but the same type as before (if a subclass of DataLoader) (1346)
- Did not interfere with a default sampler (1318)
- Removed default Adam optimizer (1317)
- Gave warnings for unimplemented required lightning methods (1317)
- Made `evaluate` method private >> `Trainer._evaluate(...)`. (1260)
- Simplify the PL examples structure (shallower and more readable) (1247)
- Changed min-max GPU memory to be on their own plots (1358)
- Remove `.item` which causes sync issues (1254)
- Changed smoothing in TQDM to decrease variability of time remaining between training/eval (1194)
- Change default logger to a dedicated one (1064)

Deprecated

- Deprecated Trainer argument `print_nan_grads` (1097)
- Deprecated Trainer argument `show_progress_bar` (1108)

Removed

- Removed duplicated module `pytorch_lightning.utilities.arg_parse` for loading CLI arguments (1167)
- Removed wandb logger's `finalize` method (1193)
- Dropped `torchvision` dependency in tests and added own MNIST dataset class instead (986)

Fixed

- Fixed `model_checkpoint` when saving all models (1359)
- `Trainer.add_argparse_args` classmethod fixed. Now it adds a type for the arguments (1147)
- Fixed bug related to type cheking of `ReduceLROnPlateau` lr schedulers(1114)
- Fixed a bug to ensure lightning checkpoints to be backward compatible (1132)
- Fixed a bug that created an extra dataloader with active `reload_dataloaders_every_epoch` (1181)
- Fixed all warnings and errors in the docs build process (1191)
- Fixed an issue where `val_percent_check=0` would not disable validation (1251)
- Fixed average of incomplete `TensorRunningMean` (1309)
- Fixed `WandbLogger.watch` with `wandb.init()` (1311)
- Fixed an issue with early stopping that would prevent it from monitoring training metrics when validation is disabled / not implemented (1235)
- Fixed a bug that would cause `trainer.test()` to run on the validation set when overloading `validation_epoch_end` and `test_end` (1353)
- Fixed `WandbLogger.watch` - use of the watch method without importing `wandb` (1311)
- Fixed `WandbLogger` to be used with 'ddp' - allow reinits in sub-processes (1149, 1360)
- Made `training_epoch_end` behave like `validation_epoch_end` (1357)
- Fixed `fast_dev_run` running validation twice (1365)
- Fixed pickle error from quick patch `__code__` (1352)
- Fixed memory leak on GPU0 (1094, 1349)
- Fixed checkpointing interval (1272)
- Fixed validation and training loops run the partial dataset (1192)
- Fixed running `on_validation_end` only on main process in DDP (1125)
- Fixed `load_spawn_weights` only in proc rank 0 (1385)
- Fixes `use_amp` issue (1145)
- Fixes using deprecated `use_amp` attribute (1145)
- Fixed Tensorboard logger error: lightning_logs directory not exists in multi-node DDP on nodes with rank != 0 (1375)
- Fixed `Unimplemented backend XLA` error on TPU (1387)

Contributors

alexeykarnachev, amoudgl, areshytko, asafmanor, awaelchli, bkkaggle, bmartinn, Borda, borisdayma, cmpute, djbyrne, ethanwharris, gerardrbentley, jbschiratti, jeremyjordan, justusschock, monney, mpariente, pertschuk, rmrao, S-aiueo32, shubhamagarwal92, SkafteNicki, sneiman, tullie, vanpelt, williamFalcon, xingzhaolee

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.7.1

Monir bug fix with `print` issues and `data_loader` (1080)

0.7.0

Overview

This is the first joint release between [pytorch-bearer](http://www.pytorchbearer.org) and Lightning, here we come ...

This release adds support for training models on Tensor Processing Units (TPU). We can now train models on GPUs and TPUs by changing a single parameter in `Trainer` (see docs). We are also bringing the flexibility of Bearer into Lightning by allowing for arbitrary user-defined callbacks, see [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/callbacks.html).

We are also including a profiler that allows Lightning users to identify training bottlenecks (see [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/profiler.html)).

This release also includes automatic sampler setup depending on the selected backend, Lightning configures the sampler correctly (no need for user input).

The loggers have also been extended to support for multiple concurrent loggers to be passed to `Trainer` as an iterable, [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/loggers.html) and added support for step-based [learning rate scheduling](https://pytorch-lightning.readthedocs.io/en/0.7.0/optimizers.html#learning-rate-scheduling).

At last, lots of bug fixes (see below).

Detail changes

Added

- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (926)
- Added `reload_dataloaders_every_epoch=False` flag for trainer. Some users require reloading data every epoch (926)
- Added `progress_bar_refresh_rate=50` flag for trainer. The refresh rate on notebooks (926)
- Updated governance docs
- Added a check to ensure that the metric used for early stopping exists before training commences (542)
- Added `optimizer_idx` argument to `backward` hook (733)
- Added `entity` argument to `WandbLogger` to be passed to `wandb.init` (783)
- Added a tool for profiling training runs (782)
- Improved flexibility for naming of TensorBoard logs, can now set `version` to a `str` to just save to that directory, and use `name=''` to prevent experiment-name directory (804)
- Added option to specify `step` key when logging metrics (808)
- Added `train_dataloader`, `val_dataloader` and `test_dataloader` arguments to `Trainer.fit()`, for alternative data parsing (759)
- Added Tensor Processing Unit (TPU) support (868)
- Added semantic segmentation example (751, 876, 881)
- Split callbacks in multiple files (849)
- Support for user-defined callbacks (889 and 950)
- Added support for multiple loggers to be passed to `Trainer` as an iterable (e.g. list, tuple, etc.) (903)
- Added support for step-based learning rate scheduling (941)
- Added support for logging hparams as `dict` (1029)
- Checkpoint and early stopping now work without val. step (1041)
- Support graceful training cleanup after Keyboard Interrupt (856, 1019)
- Added type hints for function arguments (912)
- Added default `argparser` for `Trainer` (952, 1023)
- Added TPU gradient clipping (963)
- Added max/min number of steps in Trainer (728)

Changed

- Changed default TQDM to use `tqdm.auto` for prettier outputs in IPython notebooks (752)
- Changed `pytorch_lightning.logging` to `pytorch_lightning.loggers` (767)
- Moved the default `tqdm_dict` definition from Trainer to `LightningModule`, so it can be overridden by the user (749)
- Moved functionality of `LightningModule.load_from_metrics` into `LightningModule.load_from_checkpoint` (995)
- Changed Checkpoint path parameter from `filepath` to `dirpath` (1016)
- Freezed models `hparams` as `Namespace` property (1029)
- Dropped `logging` config in package init (1015)
- Renames model steps (1051)
* `training_end` >> `training_epoch_end`
* `validation_end` >> `validation_epoch_end`
* `test_end` >> `test_epoch_end`
- Refactor dataloading, supports infinite dataloader (955)
- Create single file in `TensorBoardLogger` (777)

Deprecated

- Deprecated `pytorch_lightning.logging` (767)
- Deprecated `LightningModule.load_from_metrics` in favour of `LightningModule.load_from_checkpoint` (995, 1079)
- Deprecated `data_loader` decorator (926)
- Deprecated model steps `training_end`, `validation_end` and `test_end` (1051, 1056)

Removed

- Removed dependency on `pandas` (736)
- Removed dependency on `torchvision` (797)
- Removed dependency on `scikit-learn` (801)

Fixed

- Fixed a bug where early stopping `on_end_epoch` would be called inconsistently when `check_val_every_n_epoch == 0` (743)
- Fixed a bug where the model checkpoint didn't write to the same directory as the logger (771)
- Fixed a bug where the `TensorBoardLogger` class would create an additional empty log file during fitting (777)
- Fixed a bug where `global_step` was advanced incorrectly when using `accumulate_grad_batches > 1` (832)
- Fixed a bug when calling `self.logger.experiment` with multiple loggers (1009)
- Fixed a bug when calling `logger.append_tags` on a `NeptuneLogger` with a single tag (1009)
- Fixed sending back data from `.spawn` by saving and loading the trained model in/out of the process (1017)
- Fixed port collision on DDP (1010)
- Fixed/tested pass overrides (918)
- Fixed comet logger to log after train (892)
- Remove deprecated args to learning rate step function (890)

Contributors

airglow, akshaykvnit, AljoSt, AntixK, awaelchli, baeseongsu, bobkemp, Borda, calclavia, Calysto, djbyrne, ethanwharris, fdelrio89, hadim, hanbyul-kim, jeremyjordan, kuynzereb, luiscape, MattPainter01, neggert, onkyo14taro, peteriz, shoarora, SkafteNicki, smallzzy, srush, theevann, tullie, williamFalcon, xeTaiz, xssChauhan, yukw777

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

Page 25 of 32

Releases

Has known vulnerabilities

Previous Next

Lightning

Page 25 of 32

0.7.5

0.7.4

0.7.3

0.7.2

0.7.1

0.7.0

Page 25 of 32

Links

Releases