Disent

Latest version: v0.8.0

Safety actively analyzes 641024 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

0.5.1

Fixes
- Fix 32 Ada-GVAE averaging regressions

0.5.0

This release marks the end of my MSc. and splitting the research out into its own [repository](https://github.com/nmichlo/msc-research)!

- The repo was previously setup such that development took place on an `xdev` branch. An automated script was then used to clean this branch of research code and commit the changes to the `dev` branch, which was then published.
- This has now been disabled in favour of standard dev practice. I no longer need to maintain the old research code and can incorporate this functionality directly into disent.

MSc. Additions
- `disent.dataset.data` - Various new datasets!
+ `XYObjectData` and `XYObjectShadedData` equivalent datasets with different representations of their ground-truth factors. Disentanglement performance is affected by the choice of ground-truth factors even if the data is exactly the same!
+ `XYSquaresData` is an adversarial dataset for VAEs that use pixel-wise reconstruction losses. VAEs usually perform terribly on this dataset in terms of disentanglement performance. This dataset contains three squares that can move across a non-overlapping grid.
+ `XYSingleSquareData` is like `XYSquaresData ` but only has a single square that can move across the image.
+ `XColumnsData` is a simplistic version of `XYSquaresData ` that is still adversarial, but only moves columns left and right instead of an object across a grid.

- `disent.frameworks.vae`
+ `AdaNegTripletVae` aka. "ada_tvae": Supervised disentanglement framework that uses our proposed _Adaptive Triplet Loss_ to disentangle representations and introduce axis-alignment. Triplets are constructed using the L1 distance between ground-truth factors.
+ `DataOverlapTripletVae` aka. "ada_tvae_d": Unsupervised version of the `AdaNegTripletVae` that order triplets using the distances between datapoints in terms of the reconstruction loss. Distances within disentanglement datasets often correspond to the distances between ground-truth factors, suggesting disentanglement is accidental!
- `disent.frameworks.ae`
+ `AdaNegTripletAe` aka. "ada_tae" - The AE version of `AdaNegTripletVae `
+ `DataOverlapTripletAe` aka. "ada_tae_d" - The AE version of `DataOverlapTripletVae `
+ `AdaAe` - The AE version of the `AdaVae`

- `disent.metrics`
+ `flatness_components` consists of three separate metrics
- `distances`: measure the rank correlation between ground-truth distances and latent distances
- `linearity`: measure how well factor traversal embeddings lie on an arbitrarily rotated n-dimensional line
- `axis-alignment`: measure how well factor traversal embeddings correspond to a single latent variable (ie. an n-dimensional line that is axis-aligned).
+ `flatness` an older metric that measures the path length of factor traversal embeddings over the max distance between points.

- `experiment/configs` updated to included configs for all the added classes, frameworks, datasets, metrics and features!
+ new schedules `schedule/adanegtvae_*.yaml` that should be used with the Adaptive Triplet frameworks. Otherwise these frameworks do not learn.

MSc. Removals
- All the remaining research code contained in `research/*` has been deleted

Add Examples
- Added an example `docs/examples/extend_experiment` of how to override or extend the disent experiment conifigs! This is useful for creating your own research!
- Added an example of plotting various aspects of disent `docs/examples/plotting_examples`.

Fixes
- Fixed tests for new locations
- Added appropriate entries to the registry

0.4.0

Major Additions
- Added `disent.dataset.DisentIterDataset` to compliment `DisentDataset` for datasets without size.
- Added `Cars3d64Data` and `SmallNorb64Data` to `disent.dataset.data`. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64
- Added `disent.dataset.sampling.GroundTruthRandomWalkSampler`. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factors `x` and `y` corresponding to a grid, this sampler would simulate an agent randomly moving around the grid.
- Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using `disent.registry.KERNELS`, `disent.registry.RECON_LOSSES` and `disent.registry.LATENT_HANDLERS`. This affects:
+ `disent.frameworks.helper.latent_distributions.make_latent_distribution`
+ `disent.frameworks.helper.reconstructions.make_reconstruction_loss`
+ `disent.dataset.transform._augment.get_kernel`
- Refactored `disent.frameworks.DisentFramework`, now also supports PyTorchLightning `training`, `validation` and `test` steps.
- Split `Ae` and `Vae` heirarchy
+ This is so that we can directly check if a framework is an instance of one or the other. Previously `Vae` was a subclass of `Ae` which was unintuitive.
- Rewrite of the `disent.registry` to make it more intuitive and useful throughout `disent`. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. See `disent.registry._registry` for more information.

Other Improvements
- Improvements to `disent.dataset.DisentDataset`:
+ Added `sampler`, `transform` and `augment` properties.
+ Improved `shallow_copy` and `unwrapped_shallow_copy` logic and available arguments.
+ Can now return the ground-truth factors by specifying `DisentDataset(return_factors=True)`
+ Improved handling of batches and collating
- Added `state_space_copy(...)` to `disent.dataset.data.GroundTruthData`, this function returns a copy of the underlying state space.
+ `disent.dataset.samling` Samplers now store the copy of the state space instead of the original dataset
- Added `sample(...)` to `disent.dataset.sampling.BaseDisentSampler`, which is a more explicit alias to the original `__call__(...)` method.
- `to_img_tensor_u8` and `to_img_tensor_f32` now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affects `ToImgTensorF32` and `ToImgTensorU8` from `disent.dataset.transform`.
- Added `factor_multipliers` property to `disent.dataset.util.state_space.StateSpace` which allows custom implementations of `pos_to_idx` and `idx_to_pos`.
- Added torch math helper functions to: `disent.nn.functional`
+ including: `torch_norm`, `torch_dist`, `torch_norm_euclidean`, `torch_norm_manhattan`, and `torch_dist_hamming`.
- Added `triplet_soft_loss` and `dist_triplet_soft_loss` to `torch.nn.loss.triplet`.
- Added more modes to `disent.nn.weights.init_model_weights`.
- Added `FixedValueSchedule` and `MultiplySchedule` to `disent.schedule`. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config.
- Added `modify_name_keep_ext` to `disent.util.inout.paths`. For adding prefixes or suffixes to files names without affecting the extension.
- Added the decorator `try_njit` to `disent.util.jit`. This decorator tries to wrap the function with `numba.njit`, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements.
- Split `disent.util.lightning.callbacks` into separate files.
+ Added many new features and fixes to these callbacks for the new versions.
- Added `disent.util.math.integer` for computing the `gcd` and `lcm` with arbitrary precision values.
- Added `disent.util.visualise.vis_img` with various features for visualising both tensors and bumpy images.
+ tensors by default are considered to be in `CHW` format, while numpy arrays are considered to be in `HWC` format. These values can be overridden
+ See `torch_to_images(...)` and `numpy_to_images(...)` for more details.
+ Other duplicated functions throughout the library will be replaced with these in future.


Breaking Changes
- Temporarily removed `DSpritesImagenetData`. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon.
- `disent.dataset.transform._augment.make_kernel` default scale mode changed to `"none"` from `"sum"`.
+ This affects various other locations in the code, including `disent.frameworks.helper.reconstructions.AugmentedReconLossHandler` which uses kernels to augment loss functions.
- Split `Ae` and `Vae` heirarchy
+ `Vae` is no longer an instance of `Ae`.
- Metrics are now instances of `disent.metrics.utils.Metric`.
+ This callable class can easily be created using the `disent.metrics.utils.make_metric` decorator over existing metric functions.
+ The purpose of this change is to make metric default arguments self-contained. The `Metric` class has the functions `compute` and `compute_fast` which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
- Renamed and removed functions inside `disent.util.visualise.vis_latents`

Fixes
- Fixed `disent.dataset.sampling.GroundTruthDistSampler` numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets.
- Updated `disent.nn.functional._pca` for newer torch versions
- Renamed `disent.nn.loss.softsort.torch_soft_sort(...)` parameter `dims_at_end` to `leave_dims_at_end`. This now matches `torch_soft_rank(...)`.
- `disent.nn.loss.triplet_mining.configured_idx_mine(...)` now exits early if the mode is set to `"none"`.

Config Changes
- Removed `augment/basic.yaml` and added `augment/example.yaml` instead.
- Added the config group `run_plugins` which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels.
- `dataset/cars3d.yaml` and `dataset/smallnorb.yaml` now point to the optimized 64x64 versions of the datasets by default.
- Renamed `disable_decoder` to `detach_decoder` in `Ae` and `Vae` configs
- Removed `disable_posterior_scale` option from `Ae` and `Vae` configs
- `models/*.yaml` now directly point to a model target instead of a separate encoder and decoder
- `run_callbacks/*.yaml` now directly point to class targets rather than using pre-defined keys
- `run_logging/*.yaml` now directly point to class targets rather than using pre-defined keys
- Rewrite `experiment.run` to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.
+ Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the `DISENT_CONFIGS_PREPEND` environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existing `experiment/config` folder.
- Rewrite `HydraDataModule` to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly.
- Added `experiment.util.hydra_main` which can be used anywhere to launch a hydra experiment using the disent configs.
+ `hydra_main(...)` is used to run an experiment that passes a config to the given callback
+ `patch_hydra()` can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks for `DISENT_CONFIGS_PREPEND` is registered, as well as various OmegaConf resolvers, including:
- `${exit:<msg>}` register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!
- `${run_num:<root_dir>}` returns the current experiment number
- `${run_dir:<root_dir>,<name>}` returns the current experiment folder with the name appended
- `${fmt:"{:04d}",42}` returns "0042", the exact same as `str.format`
- `${abspath:<rel_path>}` convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.
- `${rsync_dir:<src>/<name>,<dst>/<name>}` useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
- Added `experiment.util.path_utils` which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.

Test Changes
- Updated `tests.test_experiment` to use new `experiment.util.hydra_main` functionality
- Pickle tests for frameworks
- Tests for torch norm functions
- Registry test fixes
- Extensive tests for new `disent.util.visualize.vis_img` functions and returned datatypes
- `temp_environ` context manager

0.3.4

Fixes
- Leftover research config values have been fixed, addressing 23. Defaults should now just work locally.

Added
- Frameworks did not implement validation and test functions for data, addressing 22. Schedules may be unintentionally affected by this change if used with test & validation datasets. An issue has been opened to investigate this.

0.3.3

Fixes
- `disent.util.math` was not a module, added empty `__init__.py` file

0.3.2

Fixes

- Fix `FftKernel`, accidentally forgot to freeze tensor weights.
- Fix callbacks logging l1 instead of l2 distance
- Fix callbacks failure if metrics are NaN
- dsprites_imagenet macos prepare fix

Added
- `run_action=skip` experiment action to just test if hydra is working.
- VAEs now log the ratios between different loss terms.

Breaking
- `experiment.run.hydra_check_cuda` renamed to `hydra_get_gpus`. Now returns an integer for the number of GPUs to use. Intended to be passed to a PyTorch Lightning Trainer.
- Removed `XYObjectData` warning that things are now different

Page 3 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.