Overview
This is the first joint release between [pytorch-bearer](http://www.pytorchbearer.org) and Lightning, here we come ...
This release adds support for training models on Tensor Processing Units (TPU). We can now train models on GPUs and TPUs by changing a single parameter in `Trainer` (see docs). We are also bringing the flexibility of Bearer into Lightning by allowing for arbitrary user-defined callbacks, see [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/callbacks.html).
We are also including a profiler that allows Lightning users to identify training bottlenecks (see [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/profiler.html)).
This release also includes automatic sampler setup depending on the selected backend, Lightning configures the sampler correctly (no need for user input).
The loggers have also been extended to support for multiple concurrent loggers to be passed to `Trainer` as an iterable, [docs](https://pytorch-lightning.readthedocs.io/en/0.7.0/loggers.html) and added support for step-based [learning rate scheduling](https://pytorch-lightning.readthedocs.io/en/0.7.0/optimizers.html#learning-rate-scheduling).
At last, lots of bug fixes (see below).
Detail changes
Added
- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (926)
- Added `reload_dataloaders_every_epoch=False` flag for trainer. Some users require reloading data every epoch (926)
- Added `progress_bar_refresh_rate=50` flag for trainer. The refresh rate on notebooks (926)
- Updated governance docs
- Added a check to ensure that the metric used for early stopping exists before training commences (542)
- Added `optimizer_idx` argument to `backward` hook (733)
- Added `entity` argument to `WandbLogger` to be passed to `wandb.init` (783)
- Added a tool for profiling training runs (782)
- Improved flexibility for naming of TensorBoard logs, can now set `version` to a `str` to just save to that directory, and use `name=''` to prevent experiment-name directory (804)
- Added option to specify `step` key when logging metrics (808)
- Added `train_dataloader`, `val_dataloader` and `test_dataloader` arguments to `Trainer.fit()`, for alternative data parsing (759)
- Added Tensor Processing Unit (TPU) support (868)
- Added semantic segmentation example (751, 876, 881)
- Split callbacks in multiple files (849)
- Support for user-defined callbacks (889 and 950)
- Added support for multiple loggers to be passed to `Trainer` as an iterable (e.g. list, tuple, etc.) (903)
- Added support for step-based learning rate scheduling (941)
- Added support for logging hparams as `dict` (1029)
- Checkpoint and early stopping now work without val. step (1041)
- Support graceful training cleanup after Keyboard Interrupt (856, 1019)
- Added type hints for function arguments (912)
- Added default `argparser` for `Trainer` (952, 1023)
- Added TPU gradient clipping (963)
- Added max/min number of steps in Trainer (728)
Changed
- Changed default TQDM to use `tqdm.auto` for prettier outputs in IPython notebooks (752)
- Changed `pytorch_lightning.logging` to `pytorch_lightning.loggers` (767)
- Moved the default `tqdm_dict` definition from Trainer to `LightningModule`, so it can be overridden by the user (749)
- Moved functionality of `LightningModule.load_from_metrics` into `LightningModule.load_from_checkpoint` (995)
- Changed Checkpoint path parameter from `filepath` to `dirpath` (1016)
- Freezed models `hparams` as `Namespace` property (1029)
- Dropped `logging` config in package init (1015)
- Renames model steps (1051)
* `training_end` >> `training_epoch_end`
* `validation_end` >> `validation_epoch_end`
* `test_end` >> `test_epoch_end`
- Refactor dataloading, supports infinite dataloader (955)
- Create single file in `TensorBoardLogger` (777)
Deprecated
- Deprecated `pytorch_lightning.logging` (767)
- Deprecated `LightningModule.load_from_metrics` in favour of `LightningModule.load_from_checkpoint` (995, 1079)
- Deprecated `data_loader` decorator (926)
- Deprecated model steps `training_end`, `validation_end` and `test_end` (1051, 1056)
Removed
- Removed dependency on `pandas` (736)
- Removed dependency on `torchvision` (797)
- Removed dependency on `scikit-learn` (801)
Fixed
- Fixed a bug where early stopping `on_end_epoch` would be called inconsistently when `check_val_every_n_epoch == 0` (743)
- Fixed a bug where the model checkpoint didn't write to the same directory as the logger (771)
- Fixed a bug where the `TensorBoardLogger` class would create an additional empty log file during fitting (777)
- Fixed a bug where `global_step` was advanced incorrectly when using `accumulate_grad_batches > 1` (832)
- Fixed a bug when calling `self.logger.experiment` with multiple loggers (1009)
- Fixed a bug when calling `logger.append_tags` on a `NeptuneLogger` with a single tag (1009)
- Fixed sending back data from `.spawn` by saving and loading the trained model in/out of the process (1017)
- Fixed port collision on DDP (1010)
- Fixed/tested pass overrides (918)
- Fixed comet logger to log after train (892)
- Remove deprecated args to learning rate step function (890)
Contributors
airglow, akshaykvnit, AljoSt, AntixK, awaelchli, baeseongsu, bobkemp, Borda, calclavia, Calysto, djbyrne, ethanwharris, fdelrio89, hadim, hanbyul-kim, jeremyjordan, kuynzereb, luiscape, MattPainter01, neggert, onkyo14taro, peteriz, shoarora, SkafteNicki, smallzzy, srush, theevann, tullie, williamFalcon, xeTaiz, xssChauhan, yukw777
_If we forgot someone due to not matching commit email with GitHub account, let us know :]_