Lightning

Latest version: v2.3.0

Safety actively analyzes 638741 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 23 of 27

0.8.2

Overview

As we continue to strengthen the codebase with more tests, we’re finally getting rid of annoying bugs that have been around for a bit now. Mostly around the inconsistent checkpoint and early stopping behaviour (amazing work awaelchli jeremyjordan )

Noteworthy changes:

- Fixed TPU flag parsing
- fixed average_precision metric
- all the checkpoint issues should be gone now (including backward support for old checkpoints)
- DDP + loggers should be fixed

Detail changes

Added

- Added TorchText support for moving data to GPU (2379)

Changed

- Changed epoch indexing from 0 instead of 1 (2289)
- Refactor Model `backward` (2276)
- Refactored `training_batch` + tests to verify correctness (2327, 2328)
- Refactored training loop (2336)
- Made optimization steps for hooks (2363)
- Changed default apex level to 'O2' (2362)

Removed

- Moved `TrainsLogger` to Bolts (2384)

Fixed

- Fixed parsing TPU arguments and TPU tests (2094)
- Fixed number batches in case of multiple dataloaders and `limit_{*}_batches` (1920, 2226)
- Fixed an issue with forward hooks not being removed after model summary (2298)
- Fix for `load_from_checkpoint()` not working with absolute path on Windows (2294)
- Fixed an issue how _has_len handles `NotImplementedError` e.g. raised by `torchtext.data.Iterator` (2293), (2307)
- Fixed `average_precision` metric (2319)
- Fixed ROC metric for CUDA tensors (2304)
- Fixed `average_precision` metric (2319)
- Fixed lost compatibility with custom datatypes implementing `.to` (2335)
- Fixed loading model with kwargs (2387)
- Fixed sum(0) for `trainer.num_val_batches` (2268)
- Fixed checking if the parameters are a `DictConfig` Object (2216)
- Fixed SLURM weights saving (2341)
- Fixed swaps LR scheduler order (2356)
- Fixed adding tensorboard `hparams` logging test (2342)
- Fixed use model ref for tear down (2360)
- Fixed logger crash on DDP (2388)
- Fixed several issues with early stopping and checkpoint callbacks (1504, 2391)
- Fixed loading past checkpoints from v0.7.x (2405)
- Fixed loading model without arguments (2403)

Contributors

airium, awaelchli, Borda, elias-ramzi, jeremyjordan, lezwon, mateuszpieniak, mmiakashs, pwl, rohitgr7, ssakhavi, thschaaf, tridao, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.8.1

Overview

Fixing critical bugs in newly added hooks and `hparams` assignment.
The recommended data following:

1. use `prepare_data` to download and process the dataset.
2. use `setup` to do splits, and build your model internals

Detail changes

- Fixed the `load_from_checkpoint` path detected as URL bug (2244)
- Fixed hooks - added barrier (2245, 2257, 2260)
- Fixed `hparams` - remove frame inspection on `self.hparams` (2253)
- Fixed setup and on fit calls (2252)
- Fixed GPU template (2255)

0.8.0

Overview

Highlights of this release are adding Metric package and new hooks and flags to customize your workflow.

Major features:

- brand new Metrics package with built-in DDP support (by justusschock and SkafteNicki)
- `hparams` can now be anything! (call `self.save_hyperparameters()` to register anything in the `_init_`
- many speed improvements (how we move data, adjusted some flags & PL now adds 300ms overhead per epoch only!)
- much faster `ddp` implementation. Old one was renamed `ddp_spawn`
- better support for Hydra
- added the overfit_batches flag and corrected some bugs with the `limit_[train,val,test]_batches` flag
- added conda support
- tons of bug fixes :wink:

Detail changes

Added

- Added `overfit_batches`, `limit_{val|test}_batches` flags (overfit now uses training set for all three) (2213)
- Added metrics
* Base classes (1326, 1877)
* Sklearn metrics classes (1327)
* Native torch metrics (1488, 2062)
* docs for all Metrics (2184, 2209)
* Regression metrics (2221)
- Added type hints in `Trainer.fit()` and `Trainer.test()` to reflect that also a list of dataloaders can be passed in (1723)
- Allow dataloaders without sampler field present (1907)
- Added option `save_last` to save the model at the end of every epoch in `ModelCheckpoint` (1908)
- Early stopping checks `on_validation_end` (1458)
- Attribute `best_model_path` to `ModelCheckpoint` for storing and later retrieving the path to the best saved model file (1799)
- Speed up single-core TPU training by loading data using `ParallelLoader` (2033)
- Added a model hook `transfer_batch_to_device` that enables moving custom data structures to the target device (1756)
- Added [black](https://black.readthedocs.io/en/stable/) formatter for the code with code-checker on pull (#1610)
- Added back the slow spawn ddp implementation as `ddp_spawn` (2115)
- Added loading checkpoints from URLs (1667)
- Added a callback method `on_keyboard_interrupt` for handling KeyboardInterrupt events during training (2134)
- Added a decorator `auto_move_data` that moves data to the correct device when using the LightningModule for inference (1905)
- Added `ckpt_path` option to `LightningModule.test(...)` to load particular checkpoint (2190)
- Added `setup` and `teardown` hooks for model (2229)

Changed

- Allow user to select individual TPU core to train on (1729)
- Removed non-finite values from loss in `LRFinder` (1862)
- Allow passing model hyperparameters as complete kwarg list (1896)
- Renamed `ModelCheckpoint`'s attributes `best` to `best_model_score` and `kth_best_model` to `kth_best_model_path` (1799)
- Re-Enable Logger's `ImportError`s (1938)
- Changed the default value of the Trainer argument `weights_summary` from `full` to `top` (2029)
- Raise an error when lightning replaces an existing sampler (2020)
- Enabled prepare_data from correct processes - clarify local vs global rank (2166)
- Remove explicit flush from tensorboard logger (2126)
- Changed epoch indexing from 1 instead of 0 (2206)

Deprecated

- Deprecated flags: (2213)
* `overfit_pct` in favour of `overfit_batches`
* `val_percent_check` in favour of `limit_val_batches`
* `test_percent_check` in favour of `limit_test_batches`
- Deprecated `ModelCheckpoint`'s attributes `best` and `kth_best_model` (1799)
- Dropped official support/testing for older PyTorch versions <1.3 (1917)

Removed

- Removed unintended Trainer argument `progress_bar_callback`, the callback should be passed in by `Trainer(callbacks=[...])` instead (1855)
- Removed obsolete `self._device` in Trainer (1849)
- Removed deprecated API (2073)
* Packages: `pytorch_lightning.pt_overrides`, `pytorch_lightning.root_module`
* Modules: `pytorch_lightning.logging.comet_logger`, `pytorch_lightning.logging.mlflow_logger`, `pytorch_lightning.logging.test_tube_logger`, `pytorch_lightning.overrides.override_data_parallel`, `pytorch_lightning.core.model_saving`, `pytorch_lightning.core.root_module`
* Trainer arguments: `add_row_log_interval`, `default_save_path`, `gradient_clip`, `nb_gpu_nodes`, `max_nb_epochs`, `min_nb_epochs`, `nb_sanity_val_steps`
* Trainer attributes: `nb_gpu_nodes`, `num_gpu_nodes`, `gradient_clip`, `max_nb_epochs`, `min_nb_epochs`, `nb_sanity_val_steps`, `default_save_path`, `tng_tqdm_dic`

Fixed

- Run graceful training teardown on interpreter exit (1631)
- Fixed user warning when apex was used together with learning rate schedulers (1873)
- Fixed multiple calls of `EarlyStopping` callback (1863)
- Fixed an issue with `Trainer.from_argparse_args` when passing in unknown Trainer args (1932)
- Fixed bug related to logger not being reset correctly for model after tuner algorithms (1933)
- Fixed root node resolution for SLURM cluster with dash in hostname (1954)
- Fixed `LearningRateLogger` in multi-scheduler setting (1944)
- Fixed test configuration check and testing (1804)
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelt arguments (1820)
- Fixed `save_weights_only` in ModelCheckpoint (1780)
- Allow use of same `WandbLogger` instance for multiple training loops (2055)
- Fixed an issue with `_auto_collect_arguments` collecting local variables that are not constructor arguments and not working for signatures that have the instance not named `self` (2048)
- Fixed mistake in parameters' grad norm tracking (2012)
- Fixed CPU and hanging GPU crash (2118)
- Fixed an issue with the model summary and `example_input_array` depending on a specific ordering of the submodules in a LightningModule (1773)
- Fixed Tpu logging (2230)
- Fixed Pid port + duplicate `rank_zero` logging (2140, 2231)

Contributors

awaelchli, baldassarreFe, Borda, borisdayma, cuent, devashishshankar, ivannz, j-dsouza, justusschock, kepler, kumuji, lezwon, lgvaz, LoicGrobol, mateuszpieniak, maximsch2, moi90, rohitgr7, SkafteNicki, tullie, williamFalcon, yukw777, ZhaofengWu

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.7.6

Overview

Highlights of this release are adding support for TorchElastic enables distributed PyTorch training jobs to be executed in a fault-tolerant and elastic manner; auto-scaling of batch size; new transfer learning example; an option to provide seed to random generators to ensure reproducibility.

Detail changes

Added

- Added callback for logging learning rates (1498)
- Added transfer learning example (for a binary classification task in computer vision) (1564)
- Added type hints in `Trainer.fit()` and `Trainer.test()` to reflect that also a list of dataloaders can be passed in (1723).
- Added auto scaling of batch size (1638)
- The progress bar metrics now also get updated in `training_epoch_end` (1724)
- Enable `NeptuneLogger` to work with `distributed_backend=ddp` (1753)
- Added option to provide seed to random generators to ensure reproducibility (1572)
- Added override for hparams in `load_from_ckpt` (1797)
- Added support multi-node distributed execution under `torchelastic` (1811, 1818)
- Added using `store_true` for bool args (1822, 1842)
- Added dummy logger for internally disabling logging for some features (1836)

Changed

- Enable `non-blocking` for device transfers to GPU (1843)
- Replace mata_tags.csv with hparams.yaml (1271)
- Reduction when `batch_size < num_gpus` (1609)
- Updated LightningTemplateModel to look more like Colab example (1577)
- Don't convert `namedtuple` to `tuple` when transferring the batch to target device (1589)
- Allow passing `hparams` as a keyword argument to LightningModule when loading from checkpoint (1639)
- Args should come after the last positional argument (1807)
- Made DDP the default if no backend specified with multiple GPUs (1789)

Deprecated

- Deprecated `tags_csv` in favor of `hparams_file` (1271)

Fixed

- Fixed broken link in PR template (1675)
- Fixed ModelCheckpoint not None checking file path (1654)
- Trainer now calls `on_load_checkpoint()` when resuming from a checkpoint (1666)
- Fixed sampler logic for DDP with the iterable dataset (1734)
- Fixed `_reset_eval_dataloader()` for IterableDataset (1560)
- Fixed Horovod distributed backend to set the `root_gpu` property (1669)
- Fixed wandb logger `global_step` affects other loggers (1492)
- Fixed disabling progress bar on non-zero ranks using Horovod backend (1709)
- Fixed bugs that prevent LP finder to be used together with early stopping and validation dataloaders (1676)
- Fixed a bug in Trainer that prepended the checkpoint path with `version_` when it shouldn't (1748)
- Fixed LR key name in case of param groups in LearningRateLogger (1719)
- Fixed saving native AMP scaler state (introduced in 1561)
- Fixed accumulation parameter and suggestion method for learning rate finder (1801)
- Fixed num processes wasn't being set properly and auto sampler was DDP failing (1819)
- Fixed bugs in semantic segmentation example (1824)
- Fixed saving native AMP scaler state (1561, 1777)
- Fixed native AMP + DDP (1788)
- Fixed `hparam` logging with metrics (1647)

Contributors

ashwinb, awaelchli, Borda, cmpute, festeh, jbschiratti, justusschock, kepler, kumuji, nanddalal, nathanbreitsch, olineumann, pitercl, rohitgr7, S-aiueo32, SkafteNicki, tgaddair, tullie, tw991, williamFalcon, ybrovman, yukw777

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

0.7.5

We made a few changes to Callbacks to test ops on detached GPU tensors to avoid CPU transfer. However, it made callbacks unpicklable which will crash DDP.

This release fixes that core issue

Changed

- Allow logging of metrics together with hparams (1630)
- Allow metrics logged together with hparams (1630)

Removed

- Removed Warning from trainer loop (1634)

Fixed

- Fixed ModelCheckpoint not being fixable (1632)
- Fixed CPU DDP breaking change and DDP change (1635)
- Tested pickling (1636)

Contributors

justusschock, quinor, williamFalcon

0.7.4

Key updates

- PyTorch 1.5 support
- Added Horovod distributed_backend option
- Enable forward compatibility with the native AMP (PyTorch 1.6).
- Support 8-core TPU on Kaggle
- Added ability to customize progress_bar via Callbacks
- Speed/memory optimizations.
- Improved Argparse usability with Trainer
- Docs improvements
- Tons of bug fixes

Detail changes

Added

- Added flag `replace_sampler_ddp` to manually disaple sampler replacement in ddp (1513)
- Added speed parity tests (max 1 sec difference per epoch)(1482)
- Added `auto_select_gpus` flag to trainer that enables automatic selection of available GPUs on exclusive mode systems.
- Added learining rate finder (1347)
- Added support for ddp mode in clusters without SLURM (1387)
- Added `test_dataloaders` parameter to `Trainer.test()` (1434)
- Added `terminate_on_nan` flag to trainer that performs a NaN check with each training iteration when set to `True` (1475)
- Added speed parity tests (max 1 sec difference per epoch)(1482)
- Added `terminate_on_nan` flag to trainer that performs a NaN check with each training iteration when set to `True`. (1475)
- Added `ddp_cpu` backend for testing ddp without GPUs (1158)
- Added [Horovod](http://horovod.ai) support as a distributed backend `Trainer(distributed_backend='horovod')` (#1529)
- Added support for 8 core distributed training on Kaggle TPU's (1568)
- Added support for native AMP (1561, [1580)

Changed

- Changed the default behaviour to no longer include a NaN check with each training iteration. (1475)
- Decoupled the progress bar from trainer. It is a callback now and can be customized or even be replaced entirely (1450).
- Changed lr schedule step interval behavior to update every backwards pass instead of every forwards pass (1477)
- Defines shared proc. rank, remove rank from instances (e.g. loggers) (1408)
- Updated semantic segmentation example with custom u-net and logging (1371)
- Disabled val and test shuffling (1600)

Deprecated

- Deprecated `training_tqdm_dict` in favor of `progress_bar_dict` (1450).

Removed

- Removed `test_dataloaders` parameter from `Trainer.fit()` (1434)

Fixed

- Added the possibility to pass nested metrics dictionaries to loggers (1582)
- Fixed memory leak from opt return (1528)
- Fixed saving checkpoint before deleting old ones (1453)
- Fixed loggers - flushing last logged metrics even before continue, e.g. `trainer.test()` results (1459)
- Fixed optimizer configuration when `configure_optimizers` returns dict without `lr_scheduler` (1443)
- Fixed `LightningModule` - mixing hparams and arguments in `LightningModule.__init__()` crashes load_from_checkpoint() (1505)
- Added a missing call to the `on_before_zero_grad` model hook (1493).
- Allow use of sweeps with WandbLogger (1512)
- Fixed a bug that caused the `callbacks` Trainer argument to reference a global variable (1534).
- Fixed a bug that set all boolean CLI arguments from Trainer.add_argparse_args always to True (1571)
- Fixed do not copy the batch when training on a single GPU (1576, [1579)
- Fixed soft checkpoint removing on DDP (1408)
- Fixed automatic parser bug (1585)
- Fixed bool conversion from string (1606)

Contributors

alexeykarnachev, areshytko, awaelchli, Borda, borisdayma, ethanwharris, fschlatt, HenryJia, Ir1d, justusschock, karlinjf, lezwon, neggert, rmrao, rohitgr7, SkafteNicki, tgaddair, williamFalcon

_If we forgot someone due to not matching commit email with GitHub account, let us know :]_

Page 23 of 27

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.