Overview
Highlights of this release are adding Metric package and new hooks and flags to customize your workflow.
Major features:
- brand new Metrics package with built-in DDP support (by justusschock and SkafteNicki)
- `hparams` can now be anything! (call `self.save_hyperparameters()` to register anything in the `_init_`
- many speed improvements (how we move data, adjusted some flags & PL now adds 300ms overhead per epoch only!)
- much faster `ddp` implementation. Old one was renamed `ddp_spawn`
- better support for Hydra
- added the overfit_batches flag and corrected some bugs with the `limit_[train,val,test]_batches` flag
- added conda support
- tons of bug fixes :wink:
Detail changes
Added
- Added `overfit_batches`, `limit_{val|test}_batches` flags (overfit now uses training set for all three) (2213)
- Added metrics
* Base classes (1326, 1877)
* Sklearn metrics classes (1327)
* Native torch metrics (1488, 2062)
* docs for all Metrics (2184, 2209)
* Regression metrics (2221)
- Added type hints in `Trainer.fit()` and `Trainer.test()` to reflect that also a list of dataloaders can be passed in (1723)
- Allow dataloaders without sampler field present (1907)
- Added option `save_last` to save the model at the end of every epoch in `ModelCheckpoint` (1908)
- Early stopping checks `on_validation_end` (1458)
- Attribute `best_model_path` to `ModelCheckpoint` for storing and later retrieving the path to the best saved model file (1799)
- Speed up single-core TPU training by loading data using `ParallelLoader` (2033)
- Added a model hook `transfer_batch_to_device` that enables moving custom data structures to the target device (1756)
- Added [black](https://black.readthedocs.io/en/stable/) formatter for the code with code-checker on pull (#1610)
- Added back the slow spawn ddp implementation as `ddp_spawn` (2115)
- Added loading checkpoints from URLs (1667)
- Added a callback method `on_keyboard_interrupt` for handling KeyboardInterrupt events during training (2134)
- Added a decorator `auto_move_data` that moves data to the correct device when using the LightningModule for inference (1905)
- Added `ckpt_path` option to `LightningModule.test(...)` to load particular checkpoint (2190)
- Added `setup` and `teardown` hooks for model (2229)
Changed
- Allow user to select individual TPU core to train on (1729)
- Removed non-finite values from loss in `LRFinder` (1862)
- Allow passing model hyperparameters as complete kwarg list (1896)
- Renamed `ModelCheckpoint`'s attributes `best` to `best_model_score` and `kth_best_model` to `kth_best_model_path` (1799)
- Re-Enable Logger's `ImportError`s (1938)
- Changed the default value of the Trainer argument `weights_summary` from `full` to `top` (2029)
- Raise an error when lightning replaces an existing sampler (2020)
- Enabled prepare_data from correct processes - clarify local vs global rank (2166)
- Remove explicit flush from tensorboard logger (2126)
- Changed epoch indexing from 1 instead of 0 (2206)
Deprecated
- Deprecated flags: (2213)
* `overfit_pct` in favour of `overfit_batches`
* `val_percent_check` in favour of `limit_val_batches`
* `test_percent_check` in favour of `limit_test_batches`
- Deprecated `ModelCheckpoint`'s attributes `best` and `kth_best_model` (1799)
- Dropped official support/testing for older PyTorch versions <1.3 (1917)
Removed
- Removed unintended Trainer argument `progress_bar_callback`, the callback should be passed in by `Trainer(callbacks=[...])` instead (1855)
- Removed obsolete `self._device` in Trainer (1849)
- Removed deprecated API (2073)
* Packages: `pytorch_lightning.pt_overrides`, `pytorch_lightning.root_module`
* Modules: `pytorch_lightning.logging.comet_logger`, `pytorch_lightning.logging.mlflow_logger`, `pytorch_lightning.logging.test_tube_logger`, `pytorch_lightning.overrides.override_data_parallel`, `pytorch_lightning.core.model_saving`, `pytorch_lightning.core.root_module`
* Trainer arguments: `add_row_log_interval`, `default_save_path`, `gradient_clip`, `nb_gpu_nodes`, `max_nb_epochs`, `min_nb_epochs`, `nb_sanity_val_steps`
* Trainer attributes: `nb_gpu_nodes`, `num_gpu_nodes`, `gradient_clip`, `max_nb_epochs`, `min_nb_epochs`, `nb_sanity_val_steps`, `default_save_path`, `tng_tqdm_dic`
Fixed
- Run graceful training teardown on interpreter exit (1631)
- Fixed user warning when apex was used together with learning rate schedulers (1873)
- Fixed multiple calls of `EarlyStopping` callback (1863)
- Fixed an issue with `Trainer.from_argparse_args` when passing in unknown Trainer args (1932)
- Fixed bug related to logger not being reset correctly for model after tuner algorithms (1933)
- Fixed root node resolution for SLURM cluster with dash in hostname (1954)
- Fixed `LearningRateLogger` in multi-scheduler setting (1944)
- Fixed test configuration check and testing (1804)
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelt arguments (1820)
- Fixed `save_weights_only` in ModelCheckpoint (1780)
- Allow use of same `WandbLogger` instance for multiple training loops (2055)
- Fixed an issue with `_auto_collect_arguments` collecting local variables that are not constructor arguments and not working for signatures that have the instance not named `self` (2048)
- Fixed mistake in parameters' grad norm tracking (2012)
- Fixed CPU and hanging GPU crash (2118)
- Fixed an issue with the model summary and `example_input_array` depending on a specific ordering of the submodules in a LightningModule (1773)
- Fixed Tpu logging (2230)
- Fixed Pid port + duplicate `rank_zero` logging (2140, 2231)
Contributors
awaelchli, baldassarreFe, Borda, borisdayma, cuent, devashishshankar, ivannz, j-dsouza, justusschock, kepler, kumuji, lezwon, lgvaz, LoicGrobol, mateuszpieniak, maximsch2, moi90, rohitgr7, SkafteNicki, tullie, williamFalcon, yukw777, ZhaofengWu
_If we forgot someone due to not matching commit email with GitHub account, let us know :]_