Lumin

Latest version: v0.9.1

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.8.1

Important changes

- The sparse package is now an optional dependency to help easy installation on some platforms, if required, please install manually, e.g. `pip install sparse`

Breaking

- `targ_feats` argument in `HEPAugFoldYielder` renamed to `aug_targ_feats`

Additions

- `plot_feat` now shows a bar plot for categorical data
- `bootstrap_stats` added median computation
- `IdentBody` and `IdentTail` modules, which are placeholders for the body and tail modules in a network, for use when only a head is needed.
- `NodePredictor` a special `GraphCollapser` which provides a set of predictions per node in a graph, outputting either (batch x predictions x vertices) or (batch x vertices x predictions)
- `Ensemble` warns if no `ModelBuilder` is set when saving
- `agg_methods` argument for `GravNet`
- `absmax` aggregation method for `GravNet` and `GraphCollapser`
- 'hard_identity' function to replace lambda x: x when required
- `fold2foldfile`, `df2foldfile`, and `add_meta_data` can now deal with targets in the form of multi dimensional tensors, and convert them to sparse COO format
- `df2foldfile` now has the option to not shuffle data into folds and instead split it into contiguous folds
- Limited handling of PyTorch Geometric data: `TorchGeometricFoldYielder`, `TorchGeometricBatchYielder`, `TorchGeometricEvalMetric`
- Make `RunningBatchNorm` affine transformation optional

Removals

Fixes

- `proc_cats`
- Typo fix in `proc_cats` affecting the detection of categories in the testing data. (Thanks yaourtpourtoi)
- Doc string incorrectly stated that cat_maps mapped categories to codes, whereas it actually maps codes to categories
- `lr_find`
- Fixes to do with the number of batches to expect when running few number of folds than the `FoldYielder` contains.
- Correctly implements leave-one-out for the training folds
- renamed `n_folds` to `n_repeats` to more accurately reflect its role.
- `bootstrap_stats` corrected computation of central 68% CI: was `np.percentile(np.abs(points), 68.2)` now `(np.percentile(points, 84.135)-np.percentile(points, 15.865))/2`
- Error when trying to initialise `SEBlock2d` or `SEBlock3d`
- Fixed ipython display import to only run if in notebook
- Bug in multiclass-classification with on a batch of 1 data-point caused by targets being squeezed 2 dimensions, rather than 1.
- `tensor_is_sparse` argument for `df2foldfile` not functioning as expected
- Possible bug when applying data augmentation using `HEPAugFoldYielder` to target features, but not supplying target features when initialising the fold yielder
- Potential bug in `NodePredictor` when `f_final` is a `hard_identity` and `f_final_outs` is not None.
- `OffsetSelfAttention` missing from module `__all__`
- Possible bug when building ensembles from results caused by a misalignment between model index in results and model savename
- Require matplotlib <= 3.4.0

Changes

- `GravNetLayer` Self attention width corrected to `n_lr//4`, was previously `n_out//4`
- New PDPBox version finally released on PIP, so no longer requires separate installation, however PDPBox is now an optional dependency
- Relaxed version requirement on statsmodels
- Removed lambda expressions and locally defined function from NN code to make it compatible with the new torch package export method
- Extends Model training and inference to instantiate BatchYielders as prescribed by the FoldYielder, allowing users to provide their own BatchYielders in cases where data needs to be split in specific ways
- Optional plotting for some plot functions.

Depreciations

Comments

v0.8.0 - Mistake Not...

Important changes

- GNN architectures generalised into feature extraction and graph collapse stages, see details below and updated tutorial

Breaking

Additions

- `GravNet` GNN head and `GravNetLayer` sub-block [Qasim, Kieseler, Iiyama, & Pierini, 2019](https://link.springer.com/article/10.1140/epjc/s10052-019-7113-9)
- Includes optional self-attention
- `SelfAttention` and `OffsetSelfAttention`
- Batchnorm:
- `LCBatchNorm1d` to run batchnorm over length x channel data
- Additional `bn_class` arguments to blocks, allowing the user to choose different batchnorm implementations
- 1, 2, & 3D Running batchnorm layers from fastai (https://github.com/fastai/course-v3)
- `GNNHead` encapsulating head for feature extraction, using `AbsGraphFeatExtractor` classes, and graph collapsing, using `GraphCollapser` classes
- New callbacks:
- `AbsWeightData` to weight folds of data based on their inputs or targets
- `EpochSaver` to save the model to a new file at the end of every epoch
- `CycleStep` combines OneCycle and step-decay of optimiser hyper-parameters
- New CNN blocks:
- `AdaptiveAvgMaxConcatPool1d`, `AdaptiveAvgMaxConcatPool2d`, `AdaptiveAvgMaxConcatPool3d` use average and maximum pooling to reduce data to specified number sizes per channel
- `SEBlock1d`, `SEBlock2d`, `SEBlock3d` apply squeeze-excitation to data channels
- `BackwardHook` for recording telemetric data during backwards passes
- New losses:
- `WeightedFractionalMSE`, `WeightedBinnedHuber`, `WeightedFractionalBinnedHuber`
- Options for log x & y axis in `plot_feat`

Removals

- Scheduled removal of depreciated methods and functions from old model and callback system:
- `OldAbsCallback`
- `OldCallback`
- `OldAbsCyclicCallback`
- `OldCycleLR`
- `OldCycleMom`
- `OldOneCycle`
- `OldBinaryLabelSmooth`
- `OldBinaryLabelSmooth`
- `SequentialReweight`
- `SequentialReweightClasses`
- `OldBootstrapResample`
- `OldParametrisedPrediction`
- `OldGradClip`
- `OldLsuvInit`
- `OldAbsModelCallback`
- `OldSWA`
- `OldLRFinder`
- `OldEnsemble`
- `OldAMS`
- `OldMultiAMS`
- `OldBinaryAccuracy`
- `OldRocAucScore`
- `OldEvalMetric`
- `OldRegPull`
- `OldRegAsProxyPull`
- `OldAbsModel`
- `OldModel`
- `fold_train_ensemble`
- `OldMetricLogger`
- `fold_lr_find`
- `old_plot_train_history`
- `_get_folds`
- Unnecessary `pred_cb` argument in `train_models`

Fixes

- Bug when trying to use batchnorm in `InteractionNet`
- Bug in `FoldFile.save_fold_pred` when predictions change shape and try to overwrite existing predictions

Changes

- `padding` argument in conv 1D blocks renamed to pad
- Graph nets: generalised into feature extraction for features per vertex and graph collapsing down to flat data (with optional self-attention)
- Renamed `FowardHook` to `ForwardHook`
- Abstract classes no longer inherit from ABC, but rather have `metaclass=ABCMeta` in order to be compatible with py>=3.7
- Updated the example of binary classification of signal & background to use the model and training resulting from https://iopscience.iop.org/article/10.1088/2632-2153/ab983a
- Also changed the multi-target regression example to use non-densely connected layers, and the multi-target classification example to use a cosine annealed cyclical LR
- Updated the single-target regression example to use `WeightedBinnedHuber` as a loss
- Changed `from torch.tensor import Tensor` to `from torch import Tensor` for compatibility with latest PyTorch

Depreciations

- `OldInteractionNet` replaced in favour of `InteractionNet` feature extractor. Will be removed in v0.9

Comments

0.7.2

Important changes

- Fixed bug in `Model.set_mom` which resulted in momentum never being set (affects e.g. OneCycle and CyclicalMom)
- `Model.fit` now shuffles the fold indices for training folds prior to each epoch rather than once per training; removes the periodicity in training loss which was occasionally apparent.
- Bugs found in `OneCycle`:
- When training multiple models, the initial LR for subsequent models was the end LR of the previous model (list in partial was being mutated)
- The model did not stop training at end of cycle
- Momentum was never altered in the optimiser

Breaking

Additions

- Mish activation function
- `Model.fit_params.val_requires_grad` to control whether to compute validation epoch with gradient, default zero, built some losses might require it in the future
- `ParameterisedPrediction` now stores copies of values for parametrised features in case they change, or need to be changed locally during prediction.
- `freeze_layers` and `unfreeze_layers` methods for `Model`
- `PivotTraining` callback implementing Learning to Pivot [Louppe, Kagan, & Cranmer, 2016](https://papers.nips.cc/paper/2017/hash/48ab2f9b45957ab574cf005eb8a76760-Abstract.html)
- New example reimplementing paper's jets example
- `TargReplace` callback for replacing target data in `BatchYielder` during training
- Support for loss functions being `fastcore` `partialler` objects
- `train_models` now has arguments to:
- Exclude specific fold indices from training and validation
- Train models on unique folds, e.g. when training 5 models on a file with 10 folds, each model would be trained on their own unique pair of folds
- Added discussion of core concepts in LUMIN to the docs

Removals

Fixes

- Cases in which a NaN in the metric during training could spoil plotting and `SaveBest`
- Bug in `Model.set_mom` which resulted in momentum never being set (affects e.g. OneCycle and CyclicalMom)
- Bug in `MetricLogger.get_results` where tracking metrics could be spoilt by NaN values
- Bug in `train` when not passing any metrics
- Bug in FoldYielder when loading output pipe from Path
- Bugs found in `OneCycle`:
- When training multiple models, the initial LR for subsequent models was the end LR of the previous model (list in partial was being mutated)
- The model did not stop training at end of cycle
- Momentum was never altered in the optimiser

Changes

- `Model.fit` now shuffles the fold indices for training folds prior to each epoch rather than once per training; removes the periodicity in training loss which was occasionally apparent.
- Validation and prediction forwards passes now performed without gradient tracking to save memory and time
- `MetricLogger` now records loss values on batch end rather than on forwards end
- `on_batch_end` now always called regardless of model state

Depreciations

Comments

0.7.1

Important changes

- `EvalMetrics` revised to inherit from `Callback` and be called on validation data after every epoch. User-written `EvalMetrics` will need to be adjusted to work with the new calling method: adjust `evaluate` method and constructor may need to be adjusted; see existing metrics to see how.

Breaking

- `eval_metrics` argument in `train_models` renamed to `metric_partials` and now takes a list of partial `EvalMetrics`
- User-written `EvalMetrics` will need to be adjusted to work with the new calling method: adjust `evaluate` method and constructor may need to be adjusted; see existing metrics to see how.

Additions

- `OneCycle` now has a `cycle_ends_training` which allows training to continue at the final LR and Momentum. keeping at default of `True` ends the training once the cycle is complete, as usual.
- `to_np` now returns `None` when input tensor is `None`
- `plot_train_history` now plots metric evolution for validation data

Removals

Fixes

- `Model` now creates `cb_savepath` is it didn't already exist
- Bug in `PredHandler` where predictions were kept on device leading to increased memory usage
- Version issue in matplotlib affecting plot positioning

Changes

Depreciations

- V0.8:
- All `EvalMetrics` depreciated with metric system. They have been copied and renamed to Old* for compatibility with the old model training system.
- `OldEvalMetric`: Replaced by `EvalMetric`
- `OldMultiAMS`: Replaced by `MultiAMS`
- `OldAMS`: Replaced by `AMS`
- `OldRegPull`: Replaced by `RegPull`
- `OldRegAsProxyPull`: Replaced by `RegAsProxyPull`
- `OldRocAucScore`: Replaced by `RocAucScore`
- `OldBinaryAccuracy`: Replaced by `BinaryAccuracy`

Comments

0.7.0

Important changes

- Model training and callbacks have significantly changed:
- `Model.fit` now expects to perform the entire training proceedure, rather than just single epochs.
- A lot of the functionality of the old training method `fold_train_ensemble` is now delegated to `Model.fit`.
- A new ensemble training method `train_models` has replaced `fold_train_ensemble`. It provied a similar API, but aims to be more understandable to users.
- `Model.fit` is now 'stateful': a `fit_params` class is created containing all the information and data relevant to training the model and trainig methods change their actions according to `fit_params.state` ('train', 'valid', and 'test')
- Callbacks now have greater potential: They have more action points during the training cycle, where they can affect training behaviour, and they have access to `fit_params`, allowing them to modify more aspects of the training and have indirect access to all other callbacks.
- The "tick" for the training loop is now one epoch, i.e. validation loss is computed after the entire use of the training data (as opposed to after every sub-epoch), cyclic callbacks now work on the scale of epochs, rather than sub-epochs. Due to the data being split into folds, the concept of a sup-epoch still exists, but the APIs are now simplified for the user (previously they were a mixture of sup-epoch and epoch arguments).
- For users who do not wish to transition to the new model behaviour, the existing behaviour can still be achieved by using the `Old*` models and classes. See the depreciations section for the full list.
- Input masks (present if e.g using feature subsampling in `Model`Builder`)
- `BatchYielder` now takes an `input_mask` argument to filter inputs
- `Model` prediction methods no longer take input mask arguments, instead the input mask (if present) is automatically used. If users have already filtered their data, they should manually remove the input mask from the model (i.e. set it to None)
- Callbacks which take arguments related to (sub-)epochs (e.g. cycle length, scale, time to renewal. etc. for `CycleLR`, `OneCycle`, etc. and `SWA`) now take these arguments in terms of epochs. I.e. a OneCycle schedule with 9 training folds, running for 15 epochs would previously require e.g. `lenghts=(45,90)` in order to complete the cycle in 15 epochs (135 subepochs). Now it is specified as simply `lenghts=(5,10)`. Additionally, these arguments must be integers. Floats will be coerced to integers with warning.
- `lr_find` now runds over all training folds, instead of just 1

Breaking

- Heavy renaming of methods and classes due to changes in model trainng and callbacks.

Additions

- `__del__` method to `ForwardHook` class
- `BatchYielder`:
- Now takes an `input_mask` argument to filter inputs
- Now takes an argument allowing incomplete batches to be yielded
- Target array can now be None
- `Model`:
- now takes a `bs` argument for `evaluate`
- predictions can now be modified by passing a `PredHandler` callback to `pred_cb`. The default one simply returns the model predicitons, however other actions could be defined by the user, e.g. performing argmax for multiclass classifiers.

Removals

- `Model`:
- Now no longer takes `callbacks` and `mask_inputs` as arguments for `evaluate`
- `evaluate_from_by` removed, just call `evaluate`
- Callbacks no longer take model and plot_settings arguments during initialisation. These should be added by calling the relevant setters. `Model` will call them when relevant.

Fixes

- Potential bug in convolutional models where checking the out size of the head would affect the batchnorm averaging
- Potential bug in `plot_sample_pred` to do with bin ranges
- `ForwardHook` not working with passed hook functions

Changes

- `BinaryLabelSmooth` now only applies smoothing during training and not in validation
- `Ensemble`
- `from_results` and `build_ensemble` now no longer take `location` as an argument. Instead, results should contain the savepath for the models
- `_build_ensemble` is now private
- `Model`:
- `predict_array` and `predict_folds` are now private
- `fit` now expects to perform the entire fitting of the model, rather than just one sup-epoch. Additionally, validation loss is now computed only at the end of the epoch, rather that previously where it was computed after each fold.
- `SWA` `renewal_period` should now be None in order to prevent a second average being tracked (previously was negative)
- Some examples have been renamed, and copies using the old model fitting proceedure and old callbacks are available in `examples/old`
- `lr_find` now runds over all training folds, instead of just 1

Depreciations

- V0.8:
- Many classes and methods depreciated with new model. They have been copied and renamed to Old*.
- `OldAbsModel`: Replaced by `AbsModel`
- `OldModel`: Replaced by `Model`
- `OldAbsCallback`: Replaced by `AbsCallback`
- `OldCallback`: Replaced by `Callback`
- `OldBinaryLabelSmooth`: Replaced by `BinaryLabelSmooth`
- `OldSequentialReweight`: Will not be replaced
- `SequentialReweightClasses`: Will not be replaced
- `OldBootstrapResample`: Replaced by `BootstrapResample`
- `OldParametrisedPrediction`: Replaced by `ParametrisedPrediction`
- `OldGradClip`: Replaced by `GradClip`
- `OldLsuvInitL` Replaced by `LsuvInit`
- `OldAbsCyclicCallback`: Replaced by `AbsCyclicCallback`
- `OldCycleLR`: Replaced by `CycleLR`
- `OldCycleMom`: Replaced by `CycleMom`
- `OldOneCycle`: Replaced by `OneCycle`
- `OldLRFinder`: Replaced by `LRFinder`
- `fold_lr_find`: Replaced by `lr_find`
- `fold_train_ensemble`: Replaced by `train_models`
- `OldMetricLogger`: Replaced by `MetricLogger`
- `AbsModelCallback`: Will not be replaced
- `OldSWA`: Replaced by `SWA`
- `old_plot_train_history`: Replaced by `plot_train_history`
- `OldEnsemble`: Replaced by `Ensemble`

Comments

0.6.0

Important changes

- `auto_filter_on_linear_correlation` now examines **all** features within correlated clusters, rather than just the most correlated pair. This means that the function now only needs to be run once, rather than the previously recommended multiple rerunning.
- Moved to Scikit-learn 0.22.2, and moved, where possible, to keyword argument calls for sklearn methods in preparation for 0.25 enforcement of keyword arguments
- Fixed error in patience when using cyclical LR callbacks, now specify the number of cycles to go without improvement. Previously had to specify 1+number.
- Matrix data is no longer passed through `np.nan_to_num` in `FoldYielder`. Users should ensure that all values in matrix data are not NaN or Inf
- Tensor data:
- `df2foldfile`, `fold2foldfile`, and 'add_meta_data` can now support the saving of arbitrary matrices as a matrix input
- Pass a `numpy.array` whose first dimension matches the length of the DataFrame to the `tensor_data` argument of `df2foldfile` and a name to `tensor_name`.
The array will be split along the first dimension and the sub-arrays will be saved as matrix inputs in the resulting foldfile
- The matrices may also be passed as sparse format and be densified on loading by FoldYielder

Breaking

- `plot_rank_order_dendrogram` now returns sets of all features in cluster with distance over the threshold, rather than just the closest features in each cluster

Additions

- Addition of batch size parameter to `Ensemble.predict*`
- Lorentz Boost Network (https://arxiv.org/abs/1812.09722):
- `LorentzBoostNet` basic implementation which learns boosted particles from existing particles and extracts features from them using fixed kernel functions
- `AutoExtractLorentzBoostNet` which also learns the kernel-functions during training
- Classification `Eval` classes:
- `BinaryAccuracy`: Computes and returns the accuracy of a single-output model for binary classification tasks.
- `RocAucScore`: Computes and returns the area under the Receiver Operator Characteristic curve (ROC AUC) of a classifier model.
- `plot_binary_sample_feat`: a version of `plot_sample_pred` designed for plotting feature histograms with stacked contributions by sample for
background.
- Added compression arguments to `df2foldfile`, `fold2foldfile`, and `save_to_grp`
- Tensor data:
- `df2foldfile`, `fold2foldfile`, and 'add_meta_data` can now support the saving of arbitrary matrices as a matrix input
- Pass a `numpy.array` whose first dimension matches the length of the DataFrame to the `tensor_data` argument of `df2foldfile` and a name to `tensor_name`.
The array will be split along the first dimension and the sub-arrays will be saved as matrix inputs in the resulting foldfile
- The matrices may also be passed as sparse format and be densified on loading by FoldYielder
- `plot_lr_finders` now has a `log_y` argument for logarithmic y-axis. Default `auto` set log_y if maximum fractional difference between losses is greater than 50
- Added new rescaling options to `ClassRegMulti` using linear outputs and scaling by mean and std of targets
- `LsuvInit` now applies scaling to `nn.Conv3d` layers
- `plot_lr_finders` and `fold_lr_find` now have options to save the resulting LR finder plot (currently limited to png due to problems with pdf)
- Addition of AdamW and an optimiser, thanks to [kiryteo](https://github.com/kiryteo)
- Contribution guide, thanks to [kiryteo](https://github.com/kiryteo)
- OneCycle `lr_range` now supports a non-zero final LR; just supply a three-tuple to the `lr_range` argument.
- `Ensemble.from_models` classmethod for combining in-memory models into an Ensemble.

Removals

- `FeatureSubsample`
- `plots` keyword in `fold_train_ensemble`

Fixes

- Docs bug for nn.training due to missing ipython in requirements
- Bug in LSUV init when running on CUDA
- Bug in TF export based on searching for fullstops
- Bug in model_bar update during fold training
- Quiet bug in 'MultHead' when matrix feats were not listed first; map construction indexed self.matrix_feats not self.feats
- Slowdown in `ensemble.predict_array` which caused the array to get sent to device in during each model evaluations
-`Model.get_param_count` now includes mon-trainable params when requested
- Fixed bug in `fold_lr_find` where LR finders would use different LR steps leading to NaNs when plotting in `fold_lr_find`
- `plot_feat` used to coerce NaNs and Infs via `np.nan_to_num` prior to plotting, potentially impacting distributions, plotting scales, moments, etc. Fixed so that nan and inf values are removed rather than coerced.
- Fixed early-stopping statement in `fold_train_ensemble` to state the number as "sub-epochs" (previously said "epochs")
- Fixed error in patience when using cyclical LR callbacks, now specify the number of cycles to go without improvement. Previously had to specify 1+number.
- Unnecessary warning `df2foldfile` when no strat-key is passed.
- Saved matrices in `fold2foldfile` are now in float32
- Fixed return type of `get_layers` methods in `RNNs_CNNs_and_GNNs_for_matrix_data` example
- Bug in `model.predict_array` when predicting matrix data with a batch size
- Added missing indexing in `AbsMatrixHead` to use `torch.bool` if PyTorch version is >= 1.2 (was `uint8` but now depreciated for indexing)
- Errors when running in terminal due to trying to call `.show` on fastprogress bars
- Bug due to encoding of readme when trying to install when default encoder is ascii
- Bug when running `Model.predict` in batches when the data contains less than one batch
- Include missing files in sdist, thanks to [thatch](https://github.com/thatch)
- Test path correction in example notebook, thanks to [kiryteo](https://github.com/kiryteo)
- Doc links in `hep_proc`
- Error in `MultiHead._set_feats` when `matrix_head` does not contain 'vecs' or 'feats_per_vec' keywords
- Compatibility error in numpy >= 1.18 in `bin_binary_class_pred` due to float instead of int
- Unnecessary second loading of fold data in `fold_lr_find`
- Compatibility error when working in PyTorch 1.6 based on integer and true division
- SWA not evaluating in batches when running in non-bulk-move mode
- Moved from `normed` to `density` keywords for matplotlib

Changes

- `ParametrisedPrediction` now accepts lists of parameterisation features
- `plot_sample_pred` now ensures that signal and background have the same binning
- `PlotSettings` now coerces string arguments for `savepath` to `Path`
- Added default value for `targ_name` in `EvalMetric`
- `plot_rank_order_dendrogram`:
- Now uses "optimal ordering" for improved presentation
- Now returns sets of all features in cluster with distance over the threshold, rather than just the closest features in each cluster
- `auto_filter_on_linear_correlation` now examines **all** features within correlated clusters, rather than just the most correlated pair. This means that the function now only needs to be run once, rather than the previously recommended multiple rerunning.
- Improved data shuffling in `BatchYielder`, now runs much quicker
- Slight speedup when loading data from foldfiles
- Matrix data is no longer passed through `np.nan_to_num` in `FoldYielder`. Users should ensure that all values in matrix data are not NaN or Inf

Depreciations

Comments

- RFPImp still imports from `sklearn.ensemble.forest` which is depreciated, and possibly part of the private API. Hopefully the package will remedy this in time for depreciation. For now, future warnings are displayed.

0.5.1

Important changes

- New live plot for losses during training (`MetricLogger`):
- Provides additional information
- Only updates after every epoch (previously every subepoch) reducing training times
- Nicer appearance and automatic log scale for y-axis

Breaking

Additions

- New live plot for losses during training (`MetricLogger`):
- Provides additional information
- Only updates after every epoch (previously every subepoch) reducing training times
- Nicer appearance and automatic log scale for y-axis

Removals

Fixes

- Fixed error in documentation which removed the ToC for the nn module

Changes

Depreciations

- `plots` argument in `fold_train_ensemble`. The plots argument is now depreciated and ignored. Loss history will always be shown, lr history will no longer be shown separately, and live feedback is now controlled by the four live_fdbk arguments. This argument will be removed in V0.6.

Comments

Page 1 of 2

Releases

Has known vulnerabilities

Lumin

Page 1 of 2

0.8.1

0.7.2

0.7.1

0.7.0

0.6.0

0.5.1

Page 1 of 2

Links

Releases