Disent

Latest version: v0.8.0

Safety actively analyzes 634607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

0.3.1

Experiment Fixes
- `run_action=prepare_data` has been fixed

Experiment Additions
- new tests to ensure this continues to work properly

Experiment Changes
- correct action is now chosen via the `experiment.run.run_action(cfg)` method
+ `experiment.run.train` renamed to `action_train`
+ `experiment.run.prepare_data` renamed to `action_prepare_data`
- input config is no longer mutated

0.3.0

This release touches most of the codebase.

Major Additions
- added `XYObjectShadedData` dataset, which is exactly the same as `XYObjectData` but the ground truth factors differ. This might be useful for testing how metrics are affected by the ground truth representation of factors. Note that XYObjectData differs from previous versions due to this.
- added `DSpritesImagenetData` dataset that is the same as `DSpritesData` but masks that background or foreground depending on the mode and replaces the content with deterministic data from tiny-imagenet
- added `disent.framework.vae.AdaGVaeMinimal` which is a minimal implementation of `AdaVae` configured to run in `gvae`
- added `disent.util.lightning.callbacks.VaeGtDistsLoggingCallback` which logs various distances matrices computed from averaged ground truth factor traversals.
- Updated experiment files to use hydra 1.1
+ can now switch between `train` and `prepare_data` modes with the defaults group `run_action=train`

Other Additions
- added `shallow_copy` to `disent.dataset.DisentDataset` enabling a shallow copy of the dataset but overriding specific properties such as the transform
- added new `disent.dataset.transform` including `ToImgTensorF32` (was `ToStandardisedTensor `) and `ToImgTensorU8`
- additions to `H5Builder`
+ `add_dataset_from_array` that constructs and fills a dataset in the hdf5 file from an array
+ converted into context manager instead of manually opening the hdf5 file
- additions to `StateSpace` (and ground truth dataset child classes)
+ `normalise_factor_idx` convert names of ground truth factors into the numerical value
+ `normalise_factor_idxs` convert a name, an idx, lists of names, or lists of idxs to the numerical values of the ground truth factors.
- `disent.dataset.util.stats` added `compute_data_mean_std(data)` to compute the mean and std of datasets
- added `disent.schedule.SingleSchedule`
- improved `disent.util.deprecate.deprecated`, now prints the stack trace for the call location of the deprecated function by default. This can be disabled.
- added `restart` method to `disent.util.profiling.Timer` for easy use within a loop
- added `disent.util.vizualize.plot` which contains various matplotlib helper code used throughout the library and PyTorch lightning callbacks.

Breaking Changes
- removed confusing `observation_shape` and `obs_shape` properties from `GroundTruthData` and any child classes. Any methods that require these properties across disent had their names update too. For example the `ArrayGroundTruthData` class now takes `x_shape`.
+ `observation_shape` `(H, W, C)` should be replaced with `img_shape`, you will need to update your overrides in child classes
+ `obs_shape` `(C, H, W)` should be replaced with `x_shape`
- `XYObjectData` default parameters updated for `XYObjectShadedData `, dataset and colour palettes differs slightly from previous versions.
- moved module `disent.nn.transform` to `disent.dataset.transform`
+ renamed `ToStandardisedTensor` to `ToImgTensorF32`
- `H5Builder` converted into context manager, similar API to `open` or `h5py.File`
- `ReconLossHandlerMse` changed to not scale or centre the output, this is because we now normalise the data instead which is more correct
- `AdaVae` and inheriting classes have various functions renamed for clarity
- `disent.metrics` functions have `ground_truth_dataset` parameter renamed to `dataset`
- `disent.model.ae` renamed `DecoderTest` and `EncoderTest` to `DecoderLinear` and `EncoderLinear`
- `disent.registry` updated registry to use new more simple class structure and format. Some variables have been renamed, and registry names have been changed to plurals, eg. `OPTIMIZER` is now `OPTIMIZERS`
- `disent.schedule` cleaned up
+ renamed various variables and parameters `min_step` -> `start_step`, `max_step` -> `end_step`
+ removed `disent.schedule.lerp.scale()` function, as it is the same as `lerp` just not clipped
- `disent.util.lightning.callbacks.VaeDisentanglementLoggingCallback` renamed to `VaeMetricLoggingCallback`
- `docs.examples` updated to use new `XYObjectData` version and `ToImgTensorF32` transform

Deprecations
- deprecated `ground_truth_data` property on `DisentDataset `, this should be replaced with the shorter `gt_data` property. References to `ground_truth_data ` have been replaced in disent.

Fixes
- Fixed `Mpi3dData` datasets, and added file hashes
- Updated requirements
- Many minor fixes, usability and error message improvements

Hydra Experiment Changes

Hydra Config has finally been updated from version 1.0 to 1.1, adding support for recursive defaults and recursive instantiation. This allows is to remove all of our custom & hacky hydra helper code that previously enabled these features.
- hydra now supports recursive instantiation
- value based specialisation can now be done with recursive defaults using dummy groups

Updating hydra was a good opportunity to re-structure the configuration format.
- All settings defined in the root config that are referenced elsewhere are now in the `settings` key.
- Default settings defined in various subgroups that are referenced elsewhere are often placed in the `dsettings` key.
- Keys for various objects were renamed for clarity, eg. `augment.transform` was renamed to `augment.augment_cls`
- All datasets now require the `meta.vis_mean` and `meta.vis_std` keys that are used both to normalise the dataset and used to re-scale it between [0, 1] for visualisation during training.

Every config file has been touched, the best approach is probably to look at the new system. The general structure remains the same, but the recursive defaults from Hydra 1.1 allows us to implement various things in a more clean way.
- new defaults group `run_launcher` to easily swap between `slurm` and `local`
- defaults group `run_location` only specifies machine resources and paths
- new defaults group `sampling` specifies details and the sampling strategy to be used by the frameworks
- new defaults group `run_action` to switch between `training` and downloading and installing datasets `prepare_data`

0.2.1

Under the hood, quite a lot of code has been added or changed for this release, however the API remains very much the same.

**Additions**
- Wrapped datasets, instances of `disent.dataset.wrapper.WrappedDataset` are datasets that have some sort of mask applied to them that hides the true state space and resizes the dataset.
+ `disent.dataset.wrapper.DitheredDataset` applies an n-dimensional dithering operation to ground truth factors
+ `disent.dataset.wrapper.MaskedDataset` applies some provided boolean mask over the dataset
- `disent.dataset.DisentDataset` now supports wrapped datasets (instances of `disent.dataset.wrapper.WrappedDataset`). New methods and properties have been added to compliment this feature:
+ `is_wrapped_data` check if there is wrapped data
+ `is_wrapped_gt_data` check if there is wrapped data and the wrapped data is ground truth data
+ `wrapped_data` obtain the wrapped data, otherwise throw an error
+ `wrapped_gt_data` obtain the wrapped ground truth data, otherwise throw an error
+ `unwrapped_disent_dataset` creates a copy of the disent dataset with everything the same, except the data is unwrapped.
- `disent.util.lightning.callbacks` additions
+ Support for wrapped datasets. They automatically try to unwrap them to obtain the ground truth data which can be used to compute metrics and perform visualisations.
+ Support model output scaling to a certain range of values, fixing visualisations when using `VaeLatentCycleLoggingCallback`
- new utilities
+ `disent.util.math.dither`
+ `disent.util.math.random`
- Self contained HDF5 ground-truth datasets. These store all the information needed to construct the dataset and state space in one file, including the factor names.
+ Added `disent.dataset.data.SelfContainedHdf5GroundTruthData` to read these files
+ Added `disent.dataset.util.H5Builder` for creating these files. (API is not yet finalised)
- `disent.dataset.util.StateSpace` added helper function `iter_traversal_indices`
- `disent.nn.transform` added `ToUint8Tensor` which acts like `ToStandardisedTensor`, but instead of loading images as `float32`, it loads them as `uint8`. This is useful when you need to use datasets outside of a ML Model context, eg. performing analysis. This takes up less memory.
+ corresponding functional version exists `to_uint_tensor` complimenting `to_standardised_tensor`
- Begun work on a component & function registry, although do not use this as the API will change significantly.
-

**API Breakages**
- Under the hood, implementing wrapped data and `DisentDataset` copying requires the ability to copy samplers, so each sampler implementation should have the `uninit_copy` method implemented too.
- `ArrayGroundTruthData` is more strict about the `observation_shape` must be `(H, W, C)` or `(C, H, W)` depending on `array_chn_is_last`
- Removed reconstruction losses:
+ `ReconLossHandlerMse4` aka. `"mse4"`
+ `ReconLossHandlerMae2` aka. `"mae2"`
- Renamed `disent.util.visualize.get_factor_traversal` to `get_idx_traversal`

**Deprecations**
- `GroundTruthData` property aliases:
+ `img_shape` new property for the deprecated `observation_shape`
+ `obs_shape` new property for the deprecated `x_shape`
+ `img_channels` new property for the number of channels in the image

**Fixes**
- `disent.util.inout.files.AtomicSaveFile` minor fix to overwriting files
- `disent.util.lightning.callbacks.LoggerProgressCallback` fix to datatypes and potential crashes due to floats
- More stable experiment runs when performing sweeps. Better error handling, error messages and error catching.
- fixes to the various `requirement*.txt` files
- many other minor fixes

0.2.0

**API Breakages**
- `DisentFramework` no longer takes in `make_optimizer_fn` callback, but instead includes this as part of the `cfg` by specifying `optimizer` and `optimizer_kwargs`.
- `Ae` derived subclasses now take in an instantiated `AutoEncoder` instance to the `model` param instead of the `make_model_fn` callback.

**Additions**
- `DisentDataset` can now return observation indices in the `"idx"` field if `return_indices=True`
- `sample_random_obs_traversal` added to `GroundTruthData`
- new basic experiment test

**Chages**
- python 3.8 and 3.9 support (3.7 is unsupported due to missing standard library typing features)
- `TempNumpySeed` now inherits from `contextlib.ContextDecorator`
- updated `hydra-core` to `1.0.7`

**Fixes**
- `SmallNorbData` by default now returns observations of size `(96, 96, 1)` instead of `(96, 96)`
- Removed `Deprecated` dependency which also couldn't be pickled, fixing hydra submittit issues
- `LoggerProgressCallback` displays more reliable information and now supports PyTorch Lightning 1.4
- `HydraDataModule` now supports PyTorch Lightning 1.4
- `merge_specializations` fixed to depend on OmegaConf not Hydra

0.1.0

Initial Release

Overview

The initial release of **Disent**
- please see the docs and readme for new usage examples, changes should be easy to make to existing code, notably the `DisentDataset` and `DisentSampler` changes.

Changes

+ Replaced sampling datasets with one common class `disent.dataset.DisentDataset`
+ Wraps other datasets (`torch.utils.data.Dataset` or `disent.dataset.data.GroundTruthData`)
+ Accepts an implemented subclass of `disent.dataset.sampling.BaseDisentSampler` which controls how many observations are sampled and returned (eg. for triplet networks).
+ eg. `disent.dataset.groundtruth.GroundTruthDatasetPairs` is now `disent.dataset.sampling.GroundTruthPairSampler`

- Removed all experimental code & features unique to Disent. Hydra configs and runners for non-experimental features remain. These features will be cleaned up and re-added once I submit my dissertation.
- ❌ experimental frameworks
- ❌ experimental datasets
- ❌ experimental metrics
- ❌ experimental models
- ❌ experimental augmentations
- ❌ experiment files

+ Verified models
- some models had potentially diverged from their original implementations and papers.
- Added a new test model: EncoderTest & DecoderTest

+ `disent.nn` Changes:
- Added `disent.nn.activations.Swish`
- Removed loss reduction mode `"sum"` in `disent.nn.loss.reduction`
- Split out triplet mining logic from frameworks into `torch.nn.loss.triplet_mining`
- Replaced `from torch.nn.modules import BatchView, Unsqueeze3D, Flatten3D` with pytorch 1.9 equivalents
- Backwards compatible opt-in `disent.nn.transform.ToStandardisedTensor` enhancements

+ `disent.util` Refactor, grouping logic into submodules:
- `disent.util.inout`: utilities for working with paths, files and saving files.
- `disent.util.lightning`: various helper functions and callbacks for pytorch lightning, some incorperated from past experiment files.
- `disent.util.strings`: utilities for working with strings and ansi escape codes
- `disent.util.visualise`: moved `disent.visualise` into this module, separating framework logic and helper logic in disent.

+ Cleaned up `requirements.txt`
- optional requirements moved into: `requirements-test.txt` and `requirements-exp.txt`

+ New tests
- samplers
- models

+ And a many bug-fixes

0.0.1.dev14

Overview

This release is mostly a large set of refactors, and reproducibility improvements with regards to seeds and datasets.

Notable Changes
- Data now relies on `disent.data.datafile.DataFile`s, which are deterministic, hash and cache based, file generators that can fetch or pre-process data.
- Added `XYSquaresMinimalData`, which is a minimal faster version of `XYSquaresData` without any configuration options. With default parameters, data from `XYSquaresData` should equal `XYSquaresMinimalData`
- Added `PickleH5pyFile` that can pickle an hdf5 file and dataset. This is intended to be used with `torch` `DataLoader`s or multiprocessing.

_Definitely_ Breaking Changes
- renamed classes:
+ renamed `AugmentableDataset` to `DisentDataset`
+ renamed `BaseFramework` to `DisentFramework`
+ renamed `BaseEncoderModule` to `DisentEncoder`
+ renamed `BaseDecoderModule` to `DisentDecoder`

- consolidated maths and helper functions into new submodule `disent.nn`
+ `disent.nn.weights` initialisation functions from originally `disent.model.init`
+ `disent.nn.modules` basic modules from various locations including `DisentModule`, `DisentLightningModule`, `BatchView`, `Unsqueeze3D`, `Flatten3D`
+ `disent.nn.transform` transform and augment functions and classes from `disent.transform`, still needs to be cleaned up in future releases.
+ `disent.nn.loss` various loss functions from other places including `triplet`, `kl`, `softsort` and `reduction` modules
+ `torch.nn.functional` various differentiable torch helper functions mostly from `disent.util.math`, including functions for computing the Covariance, Correlation, Generalised Mean, PCA, DCT, Channel-Wise convolutions and more! Some functions such as kernel generation need to be moved out of here.

- split up and consolidated utilities:
+ `disent.util.cache` caching utilities including the `stalefile` decorator that only runs the wrapped function if the specified file is stale (hash does not match, or file does not exist)
+ `disent.util.colors` ANSI escape codes
+ `disent.util.function` wrapper, decorator and inspect utilities
+ `disent.util.hashing` compute the `full` hash of a file or a `fast` hash based on the README for the [imohash](https://github.com/kalafut/py-imohash) algorithm.
+ `disent.util.in_out` originally from `disent.data.util` for handling file retrieval/downloading/copying and saving
+ `disent.util.iters` general iterators or map functions, including `iter_chunks` and `iter_rechunk`
+ `disent.util.paths` path handling and file or directory management
+ `disent.util.profiling` timers & memory usage
+ `disent.util.seeds` seed management contexts and functions
+ `disent.util.strings` string formatting helper functions

- removed and cleaned up functions from:
+ `disent.data.hdf5`
+ `disent.dataset.__init__`
+ `disent.util.__init__`
+ `disent.schedule.lerp` renamed `activate` to `scale_ratio` and removed other functions.

Other Changes
- Replaced `GroundTruthData` specialisations with general loading from `DataFile`s.
- `StateSpace` now stores `factor_names` instead of `GroundTruthData` - preparing for rewrite of datasets to use dependency injections and samplers.

Experiment Config & Runner Changes
- Many config fixes for refactors
- Experiment can now be seeded

New Tests
- test `PickleH5pyFile` multiprocessing support
- test `XYSquaresData` and `XYSquaresMinimalData` similarity

Page 4 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.