Added
- Data splits from "A fair comparison of graph neural networks for graph classification", ICLR 2021.
- Replaced strings with macros, to improve maintainability
- Support to replicability with seeds. Debug CPU mode and parallel CPU mode can reproduce the same results. With CUDA
things change due to DataLoader implementation. Even by running the whole exp again, some runs in GPU differ slightly.
- Added current_set to Score and MultiScore, to keep track of whether the set under consideration is TRAINING,
VALIDATION or TEST
- Random Search support: specify a `num_samples` in the config file with the number of random trials, replace `grid`
with `random`, and specify a sampling method for each hyper-parameter. We provide different sampling methods:
- choice --> pick at random from a list of arguments
- uniform --> pick uniformly from min and max arguments
- normal --> sample from normal distribution with mean and std
- randint --> pick at random from min and max
- loguniform --> pick following the recprocal distribution from log_min, log_max, with a specified base
- Implemented a 2-way training scheme for CGMM and variants, which allows to first compute and store the graph
embeddings, and then load them from disk to solve classification tasks. Very fast and lightweight, allowing to easily
try 1K configuration for each outer fold.
- Early stopping can now work with loss functions rather than scores (but a score must be provided nonetheless)
Changed
- Minor improvements in result files
- Debug mode now prints output to the console
- Refactored engine for Link Prediction, by subclassing the TrainingEngine class
- Added chance to mini-batch edges (but not nodes) in single graph link prediction, to reduce the computational burden
- Compute statistics in iocgmm.py: removed 1 - from bottom computation, because it assigned 1 to nodes with degree 0
Fixed:
- ProgressManager can load the elapsed time of finished experiments, in case you stop and resume the entire process
- Fix for semi-supervised graph regression in `training/util.py` (added an `.unsqueeze(0)` on the `y` variable for graph
prediction tasks)
- Last config of model selection was not checked in debug mode
- Moving model to device inside experiments, before the model is passed to the optimizer. This solves optimizer
initialization problems e.g., with Adagrad
- Minor fix in AdditiveLoss
- Minor fix in Progress Manager due to Ray upgrade in PyDGN 0.4.0
- Refactored both CGMM and its incremental training strategy
- Improved evaluator code to define just once remote ray functions
- Jupyter notebook for backward compatibility with our ICLR 2020 data splits
- Backward compatibility in Splitter that handles missing validation fields in old splits.
- Using data root provided through cli rather than the value stored in dataset_kwargs.pt by data preprocessing
operations. This is because data location may have changed.
- Removed load_splitter from utils, which assumed a certain shape of the splits filename. Now we pass the filepath to
the data provider.
- minor fix in AdditiveLoss and MultiScore
- MultiScore now does not make any assumption on the underlying scores. Plus, it is easier to maintain.
- CGMM (and variants) mini batch computation (with full batch training) now produces the same loss/scores as full batch
computation
- minor fix in evaluator.py
- memory leak when not releasing output embeddings from gpu in `engine.py`
- releasing score output from gpu in `score.py`