This release touches most of the codebase.
Major Additions
- added `XYObjectShadedData` dataset, which is exactly the same as `XYObjectData` but the ground truth factors differ. This might be useful for testing how metrics are affected by the ground truth representation of factors. Note that XYObjectData differs from previous versions due to this.
- added `DSpritesImagenetData` dataset that is the same as `DSpritesData` but masks that background or foreground depending on the mode and replaces the content with deterministic data from tiny-imagenet
- added `disent.framework.vae.AdaGVaeMinimal` which is a minimal implementation of `AdaVae` configured to run in `gvae`
- added `disent.util.lightning.callbacks.VaeGtDistsLoggingCallback` which logs various distances matrices computed from averaged ground truth factor traversals.
- Updated experiment files to use hydra 1.1
+ can now switch between `train` and `prepare_data` modes with the defaults group `run_action=train`
Other Additions
- added `shallow_copy` to `disent.dataset.DisentDataset` enabling a shallow copy of the dataset but overriding specific properties such as the transform
- added new `disent.dataset.transform` including `ToImgTensorF32` (was `ToStandardisedTensor `) and `ToImgTensorU8`
- additions to `H5Builder`
+ `add_dataset_from_array` that constructs and fills a dataset in the hdf5 file from an array
+ converted into context manager instead of manually opening the hdf5 file
- additions to `StateSpace` (and ground truth dataset child classes)
+ `normalise_factor_idx` convert names of ground truth factors into the numerical value
+ `normalise_factor_idxs` convert a name, an idx, lists of names, or lists of idxs to the numerical values of the ground truth factors.
- `disent.dataset.util.stats` added `compute_data_mean_std(data)` to compute the mean and std of datasets
- added `disent.schedule.SingleSchedule`
- improved `disent.util.deprecate.deprecated`, now prints the stack trace for the call location of the deprecated function by default. This can be disabled.
- added `restart` method to `disent.util.profiling.Timer` for easy use within a loop
- added `disent.util.vizualize.plot` which contains various matplotlib helper code used throughout the library and PyTorch lightning callbacks.
Breaking Changes
- removed confusing `observation_shape` and `obs_shape` properties from `GroundTruthData` and any child classes. Any methods that require these properties across disent had their names update too. For example the `ArrayGroundTruthData` class now takes `x_shape`.
+ `observation_shape` `(H, W, C)` should be replaced with `img_shape`, you will need to update your overrides in child classes
+ `obs_shape` `(C, H, W)` should be replaced with `x_shape`
- `XYObjectData` default parameters updated for `XYObjectShadedData `, dataset and colour palettes differs slightly from previous versions.
- moved module `disent.nn.transform` to `disent.dataset.transform`
+ renamed `ToStandardisedTensor` to `ToImgTensorF32`
- `H5Builder` converted into context manager, similar API to `open` or `h5py.File`
- `ReconLossHandlerMse` changed to not scale or centre the output, this is because we now normalise the data instead which is more correct
- `AdaVae` and inheriting classes have various functions renamed for clarity
- `disent.metrics` functions have `ground_truth_dataset` parameter renamed to `dataset`
- `disent.model.ae` renamed `DecoderTest` and `EncoderTest` to `DecoderLinear` and `EncoderLinear`
- `disent.registry` updated registry to use new more simple class structure and format. Some variables have been renamed, and registry names have been changed to plurals, eg. `OPTIMIZER` is now `OPTIMIZERS`
- `disent.schedule` cleaned up
+ renamed various variables and parameters `min_step` -> `start_step`, `max_step` -> `end_step`
+ removed `disent.schedule.lerp.scale()` function, as it is the same as `lerp` just not clipped
- `disent.util.lightning.callbacks.VaeDisentanglementLoggingCallback` renamed to `VaeMetricLoggingCallback`
- `docs.examples` updated to use new `XYObjectData` version and `ToImgTensorF32` transform
Deprecations
- deprecated `ground_truth_data` property on `DisentDataset `, this should be replaced with the shorter `gt_data` property. References to `ground_truth_data ` have been replaced in disent.
Fixes
- Fixed `Mpi3dData` datasets, and added file hashes
- Updated requirements
- Many minor fixes, usability and error message improvements
Hydra Experiment Changes
Hydra Config has finally been updated from version 1.0 to 1.1, adding support for recursive defaults and recursive instantiation. This allows is to remove all of our custom & hacky hydra helper code that previously enabled these features.
- hydra now supports recursive instantiation
- value based specialisation can now be done with recursive defaults using dummy groups
Updating hydra was a good opportunity to re-structure the configuration format.
- All settings defined in the root config that are referenced elsewhere are now in the `settings` key.
- Default settings defined in various subgroups that are referenced elsewhere are often placed in the `dsettings` key.
- Keys for various objects were renamed for clarity, eg. `augment.transform` was renamed to `augment.augment_cls`
- All datasets now require the `meta.vis_mean` and `meta.vis_std` keys that are used both to normalise the dataset and used to re-scale it between [0, 1] for visualisation during training.
Every config file has been touched, the best approach is probably to look at the new system. The general structure remains the same, but the recursive defaults from Hydra 1.1 allows us to implement various things in a more clean way.
- new defaults group `run_launcher` to easily swap between `slurm` and `local`
- defaults group `run_location` only specifies machine resources and paths
- new defaults group `sampling` specifies details and the sampling strategy to be used by the frameworks
- new defaults group `run_action` to switch between `training` and downloading and installing datasets `prepare_data`