Lightning

Latest version: v2.5.1

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 23 of 32

1.0.2

Fixes the last major bugs for validation logging.
Also removes duplicate charts for metric / metric_loss.
Doing this minor release because correct validation metrics logging is critical.

Details changes

Added

- Added trace functionality to the function `to_torchscript` (4142)

Changed

- Called `on_load_checkpoint` before loading `state_dict` (4057)

Removed

- Removed duplicate metric vs step log for train loop (4173)

Fixed

- Fixed the self.log problem in `validation_step()` (4169)
- Fixed `hparams` saving - save the state when `save_hyperparameters()` is called [in `__init__`] (4163)
- Fixed runtime failure while exporting `hparams` to yaml (4158)

Contributors

Borda, NumesSanguis, rohitgr7, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

1.0.1

Obligatory post 1.0 minor release. Main fix is to make Lightning module fully compatible with Jit (had some edge-cases we had not covered).

1.0.0

Overview

...

Detail changes

- Added Explained Variance Metric + metric fix (4013)
- Added Metric <-> Lightning Module integration tests (4008)
- Added parsing OS env vars in `Trainer` (4022)
- Added classification metrics (4043)
- Updated explained variance metric (4024)
- Enabled plugins (4041)
- Enabled custom clusters (4048)
- Enabled passing in custom accelerators (4050)
- Added `LightningModule.toggle_optimizer` (4058)
- Added `LightningModule.manual_backward` (4063)

Changed

- Integrated metrics API with self.log (3961)
- Decoupled Appex (4052, 4054, 4055, 4056, 4058, 4060, 4061, 4062, 4063, 4064, 4065)
- Renamed all backends to `Accelerator` (4066)
- Enabled manual returns (4089)

Removed

- Removed `output` argument from `*_batch_end` hooks (3965, 3966)
- Removed `output` argument from `*_epoch_end` hooks (3967)
- Removed support for EvalResult and TrainResult (3968)
- Removed deprecated trainer flags: `overfit_pct`, `log_save_interval`, `row_log_interval` (3969)
- Removed deprecated early_stop_callback (3982)
- Removed deprecated model hooks (3980)
- Removed deprecated callbacks (3979)
- Removed `trainer` argument in `LightningModule.backward` [4056)

Fixed

- Fixed `current_epoch` property update to reflect true epoch number inside `LightningDataModule`, when `reload_dataloaders_every_epoch=True`. (3974)
- Fixed to print scaler value in progress bar (4053)
- Fixed mismatch between docstring and code regarding when `on_load_checkpoint` hook is called (3996)

Contributors

ananyahjha93, Borda, edenlightning, hbredin, rohitgr7, SkafteNicki, teddykoker, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.10.0

This release is a buffer in case 1.0 breaks any compatibility for people who upgrade. 0.10.0 has all the bug fixes and features of 1.0 but is 100% backward compatible. The 1.0 release following in the next 24 hours.

Overview

The major changes are:
- Results objects are deprecated (we hated them too haha)
- This means dataflow and logging have been decoupled

To log:
python
def any_step(...):
self.log('something', i_computed)

Separately, return whatever you want from methods:
python
def training_step(...):
return loss

or
python
def training_step(...):
return {'loss': loss, 'whatever': [1, 'want']}

Detail changes

Added

- Added new Metrics API. (3868, [3921)
- Enable PyTorch 1.7 compatibility (3541)
- Added `LightningModule.to_torchscript` to support exporting as `ScriptModule` (3258)
- Added warning when dropping unpicklable `hparams` (2874)
- Added EMB similarity (3349)
- Added `ModelCheckpoint.to_yaml` method (3048)
- Allow `ModelCheckpoint` monitor to be `None`, meaning it will always save ([3630)
- Disabled optimizers setup during testing (3059)
- Added support for datamodules to save and load checkpoints when training (3563
- Added support for datamodule in learning rate finder (3425)
- Added gradient clip test for native AMP (3754)
- Added dist lib to enable syncing anything across devices (3762)
- Added `broadcast` to `TPUBackend` (3814)
- Added `XLADeviceUtils` class to check XLA device type (3274)

Changed

- Refactored accelerator backends:
* moved TPU `xxx_step` to backend (3118)
* refactored DDP backend `forward` (3119)
* refactored GPU backend `__step` (3120)
* refactored Horovod backend (3121, 3122)
* remove obscure forward call in eval + CPU backend `___step` (3123)
* reduced all simplified forward (3126)
* added hook base method (3127)
* refactor eval loop to use hooks - use `test_mode` for if so we can split later (3129)
* moved `___step_end` hooks (3130)
* training forward refactor (3134)
* training AMP scaling refactor (3135)
* eval step scaling factor (3136)
* add eval loop object to streamline eval loop (3138)
* refactored dataloader process hook (3139)
* refactored inner eval loop (3141)
* final inner eval loop hooks (3154)
* clean up hooks in `run_evaluation` (3156)
* clean up data reset (3161)
* expand eval loop out (3165)
* moved hooks around in eval loop (3195)
* remove `_evaluate` fx (3197)
* `Trainer.fit` hook clean up (3198)
* DDPs train hooks (3203)
* refactor DDP backend (3204, 3207, 3208, 3209, 3210)
* reduced accelerator selection (3211)
* group prepare data hook (3212)
* added data connector (3285)
* modular is_overridden (3290)
* adding `Trainer.tune()` (3293)
* move `run_pretrain_routine` -> `setup_training` (3294)
* move train outside of setup training (3297)
* move `prepare_data` to data connector (3307)
* moved accelerator router (3309)
* train loop refactor - moving train loop to own object (3310, 3312, 3313, 3314)
* duplicate data interface definition up into DataHooks class (3344)
* inner train loop (3359, 3361, 3362, 3363, 3365, 3366, 3367, 3368, 3369, 3370, 3371, 3372, 3373, 3374, 3375, 3376, 3385, 3388, 3397)
* all logging related calls in a connector (3395)
* device parser (3400, 3405)
* added model connector (3407)
* moved eval loop logging to loggers (3408)
* moved eval loop (3412[3408)
* trainer/separate argparse (3421, 3428, 3432)
* move `lr_finder` (3434)
* organize args (3435, 3442, 3447, 3448, 3449, 3456)
* move specific accelerator code (3457)
* group connectors (3472)
* accelerator connector methods x/n (3469, 3470, 3474)
* merge backends (3476, 3477, 3478, 3480, 3482)
* apex plugin (3502)
* precision plugins (3504)
* Result - make monitor default to `checkpoint_on` to simplify (3571)
* reference to the Trainer on the `LightningDataModule` (3684)
* add `.log` to lightning module (3686, 3699, 3701, 3704, 3715)
* enable tracking original metric when step and epoch are both true (3685)
* deprecated results obj, added support for simpler comms (3681)
* move backends back to individual files (3712)
* fixes logging for eval steps (3763)
* decoupled DDP, DDP spawn (3733, 3766, 3767, 3774, 3802, 3806)
* remove weight loading hack for ddp_cpu (3808)
* separate `torchelastic` from DDP (3810)
* separate SLURM from DDP (3809)
* decoupled DDP2 (3816)
* bug fix with logging val epoch end + monitor (3812)
* decoupled DDP, DDP spawn (3733, 3817, 3819, 3927)
* callback system and init DDP (3836)
* adding compute environments (3837, [3842)
* epoch can now log independently (3843)
* test selecting the correct backend. temp backends while slurm and TorchElastic are decoupled (3848)
* fixed `init_slurm_connection` causing hostname errors (3856)
* moves init apex from LM to apex connector (3923)
* moves sync bn to each backend (3925)
* moves configure ddp to each backend (3924)
- Deprecation warning (3844)
- Changed `LearningRateLogger` to `LearningRateMonitor` (3251)
- Used `fsspec` instead of `gfile` for all IO (3320)
* Swaped `torch.load` for `fsspec` load in DDP spawn backend (3787)
* Swaped `torch.load` for `fsspec` load in cloud_io loading (3692)
* Added support for `to_disk()` to use remote filepaths with `fsspec` (3930)
* Updated model_checkpoint's to_yaml to use `fsspec` open (3801)
* Fixed `fsspec` is inconsistant when doing `fs.ls` (3805)
- Refactor `GPUStatsMonitor` to improve training speed (3257)
- Changed IoU score behavior for classes absent in target and pred (3098)
- Changed IoU `remove_bg` bool to `ignore_index` optional int (3098)
- Changed defaults of `save_top_k` and `save_last` to `None` in ModelCheckpoint (3680)
- `row_log_interval` and `log_save_interval` are now based on training loop's `global_step` instead of epoch-internal batch index (3667)
- Silenced some warnings. verified ddp refactors (3483)
- Cleaning up stale logger tests (3490)
- Allow `ModelCheckpoint` monitor to be `None` (3633)
- Enable `None` model checkpoint default (3669)
- Skipped `best_model_path` if `checkpoint_callback` is `None` (2962)
- Used `raise .. from ..` to explicitly chain exceptions (3750)
- Mocking loggers (3596, 3617, 3851, 3859, 3884, 3853, 3910, 3889, 3926)
- Write predictions in LightningModule instead of EvalResult [3882

Deprecated

- Deprecated `TrainResult` and `EvalResult`, use `self.log` and `self.write` from the `LightningModule` to log metrics and write predictions. `training_step` can now only return a scalar (for the loss) or a dictionary with anything you want. (3681)
- Deprecate `early_stop_callback` Trainer argument (3845)
- Rename Trainer arguments `row_log_interval` >> `log_every_n_steps` and `log_save_interval` >> `flush_logs_every_n_steps` (3748)

Removed

- Removed experimental Metric API (3868, 3943, 3949, 3946), listed changes before final removal:
* Added `EmbeddingSimilarity` metric (3349, [3358)
* Added hooks to metric module interface (2528)
* Added error when AUROC metric is used for multiclass problems (3350)
* Fixed `ModelCheckpoint` with `save_top_k=-1` option not tracking the best models when a monitor metric is available (3735)
* Fixed counter-intuitive error being thrown in `Accuracy` metric for zero target tensor (3764)
* Fixed aggregation of metrics (3517)
* Fixed Metric aggregation (3321)
* Fixed RMSLE metric (3188)
* Renamed `reduction` to `class_reduction` in classification metrics (3322)
* Changed `class_reduction` similar to sklearn for classification metrics (3322)
* Renaming of precision recall metric (3308)

Fixed

- Fixed `on_train_batch_start` hook to end epoch early (3700)
- Fixed `num_sanity_val_steps` is clipped to `limit_val_batches` (2917)
- Fixed ONNX model save on GPU (3145)
- Fixed `GpuUsageLogger` to work on different platforms (3008)
- Fixed auto-scale batch size not dumping `auto_lr_find` parameter (3151)
- Fixed `batch_outputs` with optimizer frequencies (3229)
- Fixed setting batch size in `LightningModule.datamodule` when using `auto_scale_batch_size` (3266)
- Fixed Horovod distributed backend compatibility with native AMP (3404)
- Fixed batch size auto scaling exceeding the size of the dataset (3271)
- Fixed getting `experiment_id` from MLFlow only once instead of each training loop (3394)
- Fixed `overfit_batches` which now correctly disables shuffling for the training loader. (3501)
- Fixed gradient norm tracking for `row_log_interval > 1` (3489)
- Fixed `ModelCheckpoint` name formatting ([3164)
- Fixed auto-scale batch size (3151)
- Fixed example implementation of AutoEncoder (3190)
- Fixed invalid paths when remote logging with TensorBoard (3236)
- Fixed change `t()` to `transpose()` as XLA devices do not support `.t()` on 1-dim tensor (3252)
- Fixed (weights only) checkpoints loading without PL (3287)
- Fixed `gather_all_tensors` cross GPUs in DDP (3319)
- Fixed CometML save dir (3419)
- Fixed forward key metrics (3467)
- Fixed normalize mode at confusion matrix (replace NaNs with zeros) (3465)
- Fixed global step increment in training loop when `training_epoch_end` hook is used (3673)
- Fixed dataloader shuffling not getting turned off with `overfit_batches > 0` and `distributed_backend = "ddp"` (3534)
- Fixed determinism in `DDPSpawnBackend` when using `seed_everything` in main process (3335)
- Fixed `ModelCheckpoint` `period` to actually save every `period` epochs (3630)
- Fixed `val_progress_bar` total with `num_sanity_val_steps` (3751)
- Fixed Tuner dump: add `current_epoch` to dumped_params (3261)
- Fixed `current_epoch` and `global_step` properties mismatch between `Trainer` and `LightningModule` (3785)
- Fixed learning rate scheduler for optimizers with internal state (3897)
- Fixed `tbptt_reduce_fx` when non-floating tensors are logged (3796)
- Fixed model checkpoint frequency (3852)
- Fixed logging non-tensor scalar with result breaks subsequent epoch aggregation (3855)
- Fixed `TrainerEvaluationLoopMixin` activates `model.train()` at the end (3858)
- Fixed `overfit_batches` when using with multiple val/test_dataloaders (3857)
- Fixed enables `training_step` to return `None` (3862)
- Fixed init nan for checkpointing (3863)
- Fixed for `load_from_checkpoint` (2776)
- Fixes incorrect `batch_sizes` when Dataloader returns a dict with multiple tensors (3668)
- Fixed unexpected signature for `validation_step` (3947)

Contributors

abrahambotros, akihironitta, ananthsub, ananyahjha93, awaelchli, Borda, c00k1ez, carmocca, f4hy, GimmickNG, jbschiratti, justusschock, LeeJZh, lezwon, Lucas-Steinmann, maxjeblick, monney, mpariente, nateraw, nrupatunga, patrickorlando, PhilJd, rohitgr7, s-rog, ShomyLiu, SkafteNicki, Sordie, teddykoker, tgaddair, Vozf, williamFalcon, XDynames, ydcjeff

_If we forgot someone due to not matching the commit email with GitHub account, let us know :]_

0.9.0

Overview

The newest PyTorch Lightning release includes final API clean-up with better data decoupling and shorter logging syntax.

Were happy to release PyTorch Lightning 0.9 today, which contains many great new features, more bugfixes than any release we ever had, but most importantly it introduced our mostly final API changes! Lightning is being adopted by top researchers and AI labs around the world, and we are working hard to make sure we provide a smooth experience and support for all the latest best practices.

Detail changes

Added

- Added SyncBN for DDP (2801, 2838)
- Added basic `CSVLogger` (2721)
- Added SSIM metrics (2671)
- Added BLEU metrics (2535)
- Added support to export a model to ONNX format (2596)
- Added support for `Trainer(num_sanity_val_steps=-1)` to check all validation data before training (2246)
- Added struct. output:
* tests for val loop flow (2605)
* `EvalResult` support for train and val. loop (2615, 2651)
* weighted average in results obj (2930)
* fix result obj DP auto reduce (3013)
- Added class `LightningDataModule` (2668)
- Added support for PyTorch 1.6 (2745)
- Added call DataModule hooks implicitly in trainer (2755)
- Added support for Mean in DDP Sync (2568)
- Added remaining `sklearn` metrics: `AveragePrecision`, `BalancedAccuracy`, `CohenKappaScore`, `DCG`, `Hamming`, `Hinge`, `Jaccard`, `MeanAbsoluteError`, `MeanSquaredError`, `MeanSquaredLogError`, `MedianAbsoluteError`, `R2Score`, `MeanPoissonDeviance`, `MeanGammaDeviance`, `MeanTweedieDeviance`, `ExplainedVariance` (2562)
- Added support for `limit_{mode}_batches (int)` to work with infinite dataloader (IterableDataset) (2840)
- Added support returning python scalars in DP (1935)
- Added support to Tensorboard logger for OmegaConf `hparams` (2846)
- Added tracking of basic states in `Trainer` (2541)
- Tracks all outputs including TBPTT and multiple optimizers (2890)
- Added GPU Usage Logger (2932)
- Added `strict=False` for `load_from_checkpoint` (2819)
- Added saving test predictions on multiple GPUs (2926)
- Auto log the computational graph for loggers that support this (3003)
- Added warning when changing monitor and using results obj (3014)
- Added a hook `transfer_batch_to_device` to the `LightningDataModule` (3038)

Changed

- Truncated long version numbers in progress bar (2594)
- Enabling val/test loop disabling (2692)
- Refactored into `accelerator` module:
* GPU training (2704)
* TPU training (2708)
* DDP(2) backend (2796)
* Retrieve last logged val from result by key (3049)
- Using `.comet.config` file for `CometLogger` (1913)
- Updated hooks arguments - breaking for `setup` and `teardown` (2850)
- Using `gfile` to support remote directories (2164)
- Moved optimizer creation after device placement for DDP backends (2904](https://github.com/PyTorchLightning/pytorch-lighting/pull/2904))
- Support `**DictConfig` for `hparam` serialization (2519)
- Removed callback metrics from test results obj (2994)
- Re-enabled naming metrics in ckpt name (3060)
- Changed progress bar epoch counting to start from 0 (3061)

Deprecated

- Deprecated Trainer attribute `ckpt_path`, which will now be set by `weights_save_path` (2681)

Removed

- Removed deprecated: (2760)
* core decorator `data_loader`
* Module hook `on_sanity_check_start` and loading `load_from_metrics`
* package `pytorch_lightning.logging`
* Trainer arguments: `show_progress_bar`, `num_tpu_cores`, `use_amp`, `print_nan_grads`
* LR Finder argument `num_accumulation_steps`

Fixed

- Fixed `accumulate_grad_batches` for last batch (2853)
- Fixed setup call while testing (2624)
- Fixed local rank zero casting (2640)
- Fixed single scalar return from training (2587)
- Fixed Horovod backend to scale LR schedlers with the optimizer (2626)
- Fixed `dtype` and `device` properties not getting updated in submodules (2657)
- Fixed `fast_dev_run` to run for all dataloaders (2581)
- Fixed `save_dir` in loggers getting ignored by default value of `weights_save_path` when user did not specify `weights_save_path` (2681)
- Fixed `weights_save_path` getting ignored when `logger=False` is passed to Trainer (2681)
- Fixed TPU multi-core and Float16 (2632)
- Fixed test metrics not being logged with `LoggerCollection` (2723)
- Fixed data transfer to device when using `torchtext.data.Field` and `include_lengths is True` (2689)
- Fixed shuffle argument for the distributed sampler (2789)
- Fixed logging interval (2694)
- Fixed loss value in the progress bar is wrong when `accumulate_grad_batches > 1` (2738)
- Fixed correct CWD for DDP sub-processes when using Hydra (2719)
- Fixed selecting GPUs using `CUDA_VISIBLE_DEVICES` (2739, 2796)
- Fixed false `num_classes` warning in metrics (2781)
- Fixed shell injection vulnerability in subprocess call (2786)
- Fixed LR finder and `hparams` compatibility (2821)
- Fixed `ModelCheckpoint` not saving the latest information when `save_last=True` (2881)
- Fixed ImageNet example: learning rate scheduler, number of workers and batch size when using DDP (2889)
- Fixed apex gradient clipping (2829)
- Fixed save apex scaler states (2828)
- Fixed a model loading issue with inheritance and variable positional arguments (2911)
- Fixed passing `non_blocking=True` when transferring a batch object that does not support it (2910)
- Fixed checkpointing to remote file paths (2925)
- Fixed adding `val_step` argument to metrics (2986)
- Fixed an issue that caused `Trainer.test()` to stall in DDP mode (2997)
- Fixed gathering of results with tensors of varying shape (3020)
- Fixed batch size auto-scaling feature to set the new value on the correct model attribute (3043)
- Fixed automatic batch scaling not working with half-precision (3045)
- Fixed setting device to root GPU (3042)

Contributors

ananthsub, ananyahjha93, awaelchli, bkhakshoor, Borda, ethanwharris, f4hy, groadabike, ibeltagy, justusschock, lezwon, nateraw, neighthan, nsarang, PhilJd, pwwang, rohitgr7, romesco, ruotianluo, shijianjian, SkafteNicki, tgaddair, thschaaf, williamFalcon, xmotli02, ydcjeff, yukw777, zerogerc

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.8.5

Overview

The point of this release is more bug fixes ahead of v 1.0.0. We now have CI tests on TPU thanks to zcain117 from Google! :slightly_smiling_face:
This means we fixed many TPU bugs we hadn’t caught before because we had no tests.
In addition, we fixed:
- all the file path errors with loggers (txs awaelchli)
- pickling errors with loggers (txs awaelchli)
- fixed all the .test() calls

Detail changes

Added

- Added a PSNR metric: peak signal-to-noise ratio (2483)
- Added functional regression metrics (2492)

Removed

- Removed auto val reduce (2462)

Fixed

- Flattening Wandb Hyperparameters (2459)
- Fixed using the same DDP python interpreter and actually running (2482)
- Fixed model summary input type conversion for models that have input dtype different from model parameters (2510)
- Made `TensorBoardLogger` and `CometLogger` pickleable (2518)
- Fixed a problem with `MLflowLogger` creating multiple run folders (2502)
- Fixed global_step increment (2455)
- Fixed TPU hanging example (2488)
- Fixed `argparse` default value bug (2526)
- Fixed Dice and IoU to avoid NaN by adding small eps (2545)
- Fixed accumulate gradients schedule at epoch 0 (continued) (2513)
- Fixed Trainer `.fit()` returning last not best weights in "ddp_spawn" (2565)
- Fixed passing (do not pass) TPU weights back on test (2566)
- Fixed DDP tests and `.test()` (2512, 2570)

Contributors

anthonytec2, awaelchli, bernardomig, Borda, EspenHa, HHousen, InCogNiTo124, rohitgr7, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

Page 23 of 32

Releases

Has known vulnerabilities

Previous Next

Lightning

Page 23 of 32

1.0.2

1.0.1

1.0.0

0.10.0

0.9.0

0.8.5

Page 23 of 32

Links

Releases