Major Additions
- Added `disent.dataset.DisentIterDataset` to compliment `DisentDataset` for datasets without size.
- Added `Cars3d64Data` and `SmallNorb64Data` to `disent.dataset.data`. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64
- Added `disent.dataset.sampling.GroundTruthRandomWalkSampler`. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factors `x` and `y` corresponding to a grid, this sampler would simulate an agent randomly moving around the grid.
- Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using `disent.registry.KERNELS`, `disent.registry.RECON_LOSSES` and `disent.registry.LATENT_HANDLERS`. This affects:
+ `disent.frameworks.helper.latent_distributions.make_latent_distribution`
+ `disent.frameworks.helper.reconstructions.make_reconstruction_loss`
+ `disent.dataset.transform._augment.get_kernel`
- Refactored `disent.frameworks.DisentFramework`, now also supports PyTorchLightning `training`, `validation` and `test` steps.
- Split `Ae` and `Vae` heirarchy
+ This is so that we can directly check if a framework is an instance of one or the other. Previously `Vae` was a subclass of `Ae` which was unintuitive.
- Rewrite of the `disent.registry` to make it more intuitive and useful throughout `disent`. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. See `disent.registry._registry` for more information.
Other Improvements
- Improvements to `disent.dataset.DisentDataset`:
+ Added `sampler`, `transform` and `augment` properties.
+ Improved `shallow_copy` and `unwrapped_shallow_copy` logic and available arguments.
+ Can now return the ground-truth factors by specifying `DisentDataset(return_factors=True)`
+ Improved handling of batches and collating
- Added `state_space_copy(...)` to `disent.dataset.data.GroundTruthData`, this function returns a copy of the underlying state space.
+ `disent.dataset.samling` Samplers now store the copy of the state space instead of the original dataset
- Added `sample(...)` to `disent.dataset.sampling.BaseDisentSampler`, which is a more explicit alias to the original `__call__(...)` method.
- `to_img_tensor_u8` and `to_img_tensor_f32` now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affects `ToImgTensorF32` and `ToImgTensorU8` from `disent.dataset.transform`.
- Added `factor_multipliers` property to `disent.dataset.util.state_space.StateSpace` which allows custom implementations of `pos_to_idx` and `idx_to_pos`.
- Added torch math helper functions to: `disent.nn.functional`
+ including: `torch_norm`, `torch_dist`, `torch_norm_euclidean`, `torch_norm_manhattan`, and `torch_dist_hamming`.
- Added `triplet_soft_loss` and `dist_triplet_soft_loss` to `torch.nn.loss.triplet`.
- Added more modes to `disent.nn.weights.init_model_weights`.
- Added `FixedValueSchedule` and `MultiplySchedule` to `disent.schedule`. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config.
- Added `modify_name_keep_ext` to `disent.util.inout.paths`. For adding prefixes or suffixes to files names without affecting the extension.
- Added the decorator `try_njit` to `disent.util.jit`. This decorator tries to wrap the function with `numba.njit`, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements.
- Split `disent.util.lightning.callbacks` into separate files.
+ Added many new features and fixes to these callbacks for the new versions.
- Added `disent.util.math.integer` for computing the `gcd` and `lcm` with arbitrary precision values.
- Added `disent.util.visualise.vis_img` with various features for visualising both tensors and bumpy images.
+ tensors by default are considered to be in `CHW` format, while numpy arrays are considered to be in `HWC` format. These values can be overridden
+ See `torch_to_images(...)` and `numpy_to_images(...)` for more details.
+ Other duplicated functions throughout the library will be replaced with these in future.
Breaking Changes
- Temporarily removed `DSpritesImagenetData`. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon.
- `disent.dataset.transform._augment.make_kernel` default scale mode changed to `"none"` from `"sum"`.
+ This affects various other locations in the code, including `disent.frameworks.helper.reconstructions.AugmentedReconLossHandler` which uses kernels to augment loss functions.
- Split `Ae` and `Vae` heirarchy
+ `Vae` is no longer an instance of `Ae`.
- Metrics are now instances of `disent.metrics.utils.Metric`.
+ This callable class can easily be created using the `disent.metrics.utils.make_metric` decorator over existing metric functions.
+ The purpose of this change is to make metric default arguments self-contained. The `Metric` class has the functions `compute` and `compute_fast` which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
- Renamed and removed functions inside `disent.util.visualise.vis_latents`
Fixes
- Fixed `disent.dataset.sampling.GroundTruthDistSampler` numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets.
- Updated `disent.nn.functional._pca` for newer torch versions
- Renamed `disent.nn.loss.softsort.torch_soft_sort(...)` parameter `dims_at_end` to `leave_dims_at_end`. This now matches `torch_soft_rank(...)`.
- `disent.nn.loss.triplet_mining.configured_idx_mine(...)` now exits early if the mode is set to `"none"`.
Config Changes
- Removed `augment/basic.yaml` and added `augment/example.yaml` instead.
- Added the config group `run_plugins` which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels.
- `dataset/cars3d.yaml` and `dataset/smallnorb.yaml` now point to the optimized 64x64 versions of the datasets by default.
- Renamed `disable_decoder` to `detach_decoder` in `Ae` and `Vae` configs
- Removed `disable_posterior_scale` option from `Ae` and `Vae` configs
- `models/*.yaml` now directly point to a model target instead of a separate encoder and decoder
- `run_callbacks/*.yaml` now directly point to class targets rather than using pre-defined keys
- `run_logging/*.yaml` now directly point to class targets rather than using pre-defined keys
- Rewrite `experiment.run` to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.
+ Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the `DISENT_CONFIGS_PREPEND` environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existing `experiment/config` folder.
- Rewrite `HydraDataModule` to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly.
- Added `experiment.util.hydra_main` which can be used anywhere to launch a hydra experiment using the disent configs.
+ `hydra_main(...)` is used to run an experiment that passes a config to the given callback
+ `patch_hydra()` can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks for `DISENT_CONFIGS_PREPEND` is registered, as well as various OmegaConf resolvers, including:
- `${exit:<msg>}` register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!
- `${run_num:<root_dir>}` returns the current experiment number
- `${run_dir:<root_dir>,<name>}` returns the current experiment folder with the name appended
- `${fmt:"{:04d}",42}` returns "0042", the exact same as `str.format`
- `${abspath:<rel_path>}` convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.
- `${rsync_dir:<src>/<name>,<dst>/<name>}` useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
- Added `experiment.util.path_utils` which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.
Test Changes
- Updated `tests.test_experiment` to use new `experiment.util.hydra_main` functionality
- Pickle tests for frameworks
- Tests for torch norm functions
- Registry test fixes
- Extensive tests for new `disent.util.visualize.vis_img` functions and returned datatypes
- `temp_environ` context manager