Chemprop

Latest version: v2.1.2

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 5

2.1.2

What's Changed
**Important changes**
* CLI implementation of RIGR as an option in `--multi-hot-atom-featurizer-mode` by akshatzalte in https://github.com/chemprop/chemprop/pull/1172

A new featurization scheme, **RIGR (Resonance Invariant Graph Representation)**, is now available. To access it via the CLI, use `--multi-hot-atom-featurizer-mode rigr`. This featurizer uses only resonance invariant features so it treats all resonance structures of a molecule identically. It uses a subset of the atom and bond features from the default v2 featurizer. With 60% fewer features, RIGR has shown comparable or superior performance across a variety of property prediction tasks in a forthcoming manuscript. An example Jupyter notebook is also provided.

**Other changes**
* Apply task_weights to default loss function in CLI by craabreu in https://github.com/chemprop/chemprop/pull/1170
* Check if dropout prop needs to be restored by KnathanM in https://github.com/chemprop/chemprop/pull/1178
* Message Passing Error Message Fix by twinbrian in https://github.com/chemprop/chemprop/pull/1161
* Fix metrics problems - Cuda-> CPU, no _defaults by KnathanM in https://github.com/chemprop/chemprop/pull/1179
* Update convert script for v1.4 by KnathanM in https://github.com/chemprop/chemprop/pull/1176

New Contributors
* craabreu made his first contribution in https://github.com/chemprop/chemprop/pull/1170

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.1.1...v2.1.2

2.1.1

Notable changes
In 1090, we started the process of integrating `logging` into the core code. This will make it easier for users to control what information Chemprop prints to output. It will also make it easier for developers to include more information outputs for potential debugging.

Scipy 1.15 subtly change how `logit` works which caused some of our tests to fail (as the values reported were slightly different than before). The expected test values have been updated. 1142

A new example notebook has been added which demonstraits how to adapt Chemprop to work with Shapley value analysis. This is another method to lend some intepretability to Chemprop models by highlighting which atom/bond features are most impactful to the final prediction value. 938

We continue to try to make chemprop easy to use. In 1091 and 1124 we added better warnings and error messages. And in 1151 we made is easy to open the example notebooks in Google Colab. This allows people reading the docs to immediately jump in and try chemprop without needing to set up a python environment.

Bug Fixes
In 1097, we fixed a bug where the transforms for scaling extra features/descriptors were turned off during validation. This caused models trained with these extra inputs to not report accurate metrics during training, which is a problem if the "best" model is selected instead of the last model as is done in hyperparameter optimization. Training a model and using the last model was unaffected as was doing inference.

1084 fixed a bug where R2Score did not have the attribute `task_weights`. This attribute is not used but is needed for compatability with other metrics

In v2.1 we transitioned to using `torchmetrics` for our metrics and loss functions, in part because it takes care of training across multiple nodes (DDP) automatically. Our custom metric for Matthew's correlation coefficient however was not set up the way `torchmetrics` expected. This was fixed in 1131.

What's Changed
* splits file is json by KnathanM in https://github.com/chemprop/chemprop/pull/1083
* add more helpful warnings about the splitting api change by JacksonBurns in https://github.com/chemprop/chemprop/pull/1091
* Fix: Splits file can have multiple splitting schemes by KnathanM in https://github.com/chemprop/chemprop/pull/1086
* Set all transforms to train during validation by KnathanM in https://github.com/chemprop/chemprop/pull/1097
* updated warning to logger by twinbrian in https://github.com/chemprop/chemprop/pull/1090
* Add task weights to r2score by KnathanM in https://github.com/chemprop/chemprop/pull/1084
* Fix `tracking_metric` overwrite issue by shihchengli in https://github.com/chemprop/chemprop/pull/1105
* Fix `save_individual_predictions` with ensembling by shihchengli in https://github.com/chemprop/chemprop/pull/1110
* Add a helpful warning when invalid SMILES are passed by JacksonBurns in https://github.com/chemprop/chemprop/pull/1124
* Fix batch size calculation for multicomponent by KnathanM in https://github.com/chemprop/chemprop/pull/1098
* Not use `transform_variance` for unscaled targets by shihchengli in https://github.com/chemprop/chemprop/pull/1108
* Add output size to attentive hparams by KnathanM in https://github.com/chemprop/chemprop/pull/1133
* Fix test failure due to scipy logit by KnathanM in https://github.com/chemprop/chemprop/pull/1142
* fix docs about extra atom descriptors by KnathanM in https://github.com/chemprop/chemprop/pull/1139
* Fix MCC for DDP and multitask by KnathanM in https://github.com/chemprop/chemprop/pull/1131
* V2: Add Shapley Value notebook for interpretability by oscarwumit in https://github.com/chemprop/chemprop/pull/938
* add notebooks to colab and docs by KnathanM in https://github.com/chemprop/chemprop/pull/1151

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.1.0...v2.1.1

2.1.0

The v2.1 release adds the uncertainty quantification modules, including estimation, calibration, and evaluation (937). For more details on uncertainty quantification in Chemprop, please refer to the [documentation](https://chemprop.readthedocs.io/en/main/tutorial/cli/predict.html#uncertainty-quantification) and the [example notebook](https://chemprop.readthedocs.io/en/main/uncertainty.html). Additionally, we switched the loss functions and metrics to `torchmetrics` (#1022). With this change we also changed the "val_loss" reported to be calculated the same as the training loss to make them comparable (1020). We also changed Chemprop to use replicates instead of cross validation (994) and batch normalization is now disabled by default (1058).

Core code changes
* The `validation_loss_function` is removed in 1023.
* The batch norm is disabled by default in 1058
* An new predictor, `QuantileFFN`, is added in 963
* `BinaryDirichletLoss` and `MulticlassDirichletLoss` are integrated into `DirichletLoss` in 1066
* The split type of `CV` and `CV_NO_VAL` are removed in 994
* A models list of metric is now registered as children modules in 1020

CLI changes
* Disable batch norm by default, and it can be turned on by `--batch-norm` 1058
* Many CLI flags related to uncertainty quantification are added 1010
* Quantile regression is now supported via `-t regression-quantile` 963
* The cross validation (CV) is replaced with replicates. The number of replicates can be specified via `--num-replicates` and the flag `--num-folds` is deprecated 994
* `--tracking-metric` is added which is the metric to track for early stopping and checkpointing 1020

New notebooks
* An notebook showing interoperability of Chemprop featurizer w/ other libraries (DGL and PyG) 1063
* Active learning 910
* Uncertainty quantification 1071

CI/CD
* Ray can be tested on Python 3.12 1064
* `USE_LIBUV: 0` is added into the CI workflow 1065

Backwards Compatibility Note
Models trained with v2.0 will not load properly in v2.1 due to the loss functions file being moved. A conversion script is provided to convert a v2.0 model to one compatible with v2.1. Its usage is `python chemprop/utils/v2_0_to_v2_1.py <v2_0.pt> <v2_1.pt>`

`data.make_split_indices` now always returns a nested list. Previously it would only return a nested list for cross validation. We encourage you to use `data.make_split_indices(num_replicates=X)` where `X` is some number greater than 1, to train on multiple splits of your data to get a better idea of the performance of your architecture. If you do use only one replicate, you will need to unnest the list like so:

train_indices, val_indices, test_indices = data.make_split_indices(mols)
train_data, val_data, test_data = data.split_data_by_indices(
all_data, train_indices, val_indices, test_indices
)
train_data, val_data, test_data = train_data[0], val_data[0], test_data[0]

What's Changed
* change installed torch version on windows actions again by shihchengli in https://github.com/chemprop/chemprop/pull/1062
* .pt instead of .ckpt by twinbrian in https://github.com/chemprop/chemprop/pull/1060
* add ModelCheckpointing to training.ipynb so best model is used automatically by donerancl in https://github.com/chemprop/chemprop/pull/1059
* Add ray to tests on python 3.12 by KnathanM in https://github.com/chemprop/chemprop/pull/1064
* `v2.1` Feature: Replicates Instead of Cross Validation Folds by JacksonBurns in https://github.com/chemprop/chemprop/pull/994
* disable libuv with env var rather than avoiding latest torch by JacksonBurns in https://github.com/chemprop/chemprop/pull/1065
* Add new example notebook for active learning by joelnkn in https://github.com/chemprop/chemprop/pull/910
* Fix: splits column is a string not a list by KnathanM in https://github.com/chemprop/chemprop/pull/1074
* Update chemprop to v2.1 in https://github.com/chemprop/chemprop/pull/1038
* This PR included the following PRs:
* Rerun notebooks for v2.1 by KnathanM in 1067
* Refactor with torchmetrics by KnathanM in 1022
* update train docs for v2.1 by KnathanM in 1069
* Disable batch norm by default by jonwzheng in 1058
* Add notebook showing interoperability of Chemprop featurizer w/other libraries by jonwzheng in 1063
* Add tracking metric options; make metrics ModuleList; other improvements by KnathanM in 1020
* Remove old validate-loss-function function by KnathanM in 1023
* V2: Uncertainty implementation in 1058
* This PR included the following PRs:
* Improve the docstring for uncertainty modules by shihchengli in 986
* Add Platt calibrator by KnathanM in 961
* Add dropout and ensemble predictors by joelnkn in 970
* Add NLL and Spearman Uncertainty Evaluators by am2145 in 984
* Add quantile regression by shihchengli in 963
* Add miscalibration area and ence evaluators by shihchengli in 1012
* Add isotonic calibrators by KnathanM in 1053
* V2 conformal calibrators by shihchengli in 989
* V2 conformal evaluators by shihchengli in 1005
* Uncertainty regression calibrators (non-conformal) by shihchengli in 1055
* Adding Evidential, MVE, and Binary Dirichlet Uncertainty Predictors by akshatzalte in 1061
* Cleanup the uncertainty modules by shihchengli in 1072
* Multiclass dirichlet give uncertainty by KnathanM in 1066
* Rename uncertainty estimator by KnathanM in 1070
* Update uncertainty notebook by shihchengli in 1071
* Add uncertainty quantification to the predict CLI by shihchengli in 1010

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.0.5...v2.1.0

2.0.5

We continue to enhance and improve the functionality and usability of Chemprop. If there are things you'd like to see addressed in a future update, please open an issue or PR.

Core code changes
We discovered that our Noam learning rate scheduler does not match what was originally proposed. The current scheduler does work well though, so it was decided to not change the definition. Instead the scheduler was renamed and refactored to be more clear. By shihchengli in https://github.com/chemprop/chemprop/pull/975
Work on uncertainty quantification methods revealed that our previous prediction tensor return dimensions would cause difficulty down the line. Now we have placed uncertainty into a separate dimension. By hwpang in https://github.com/chemprop/chemprop/pull/959
The `BinaryDirichletFFN` and `MulticlassDirichletFFN` predictors were added early in the v2 development, but not tested. Now they have been tested and corrected. By shihchengli in https://github.com/chemprop/chemprop/pull/1017
The RDKit 2D molecular featurizer was added back by popular demand. The versions used in v1 are available as well as a version that uses all available molecular features in `rdkit.Chem.Descriptors`. By KnathanM in https://github.com/chemprop/chemprop/pull/877

CLI changes
* Log statistical summary of training, validation, and test datasets by donerancl in https://github.com/chemprop/chemprop/pull/882
* Change the default verbose level to INFO by shihchengli in https://github.com/chemprop/chemprop/pull/953
* Save both probabilities and class label for multiclass classification by shihchengli in https://github.com/chemprop/chemprop/pull/987
* Add `--remove-checkpoints` flag to opt out of saving checkpoints by shihchengli in https://github.com/chemprop/chemprop/pull/1014
* Add `--class-balance` flag to `train` CLI by shihchengli in https://github.com/chemprop/chemprop/pull/1011
* Save target column names in model for use at inference by hwpang in https://github.com/chemprop/chemprop/pull/935
* Fix `save-smiles-splits` not working with rxn. columns as column header by jonwzheng in https://github.com/chemprop/chemprop/pull/998

Transfer learning
* Add new example notebook for transfer learning by joelnkn in https://github.com/chemprop/chemprop/pull/904
* Use pre-train output scaler to scale training data in CLI by KnathanM in https://github.com/chemprop/chemprop/pull/1051
* Add `--checkpoint` and `--freeze-encoder` flags in train CLI for transfer learning by shihchengli in https://github.com/chemprop/chemprop/pull/1007

Documentation
* Fixed typos in CLI reference and standardized formatting by donerancl in https://github.com/chemprop/chemprop/pull/880
* Example Notebook for Classification by twinbrian in https://github.com/chemprop/chemprop/pull/1047
* Improve frzn-ffn-layers description and update doc for transfer learning by oscarwumit in https://github.com/chemprop/chemprop/pull/993
* add transform tests by KnathanM in https://github.com/chemprop/chemprop/pull/955
* Add documentation for how to use a separate splits file (CLI) by KnathanM in https://github.com/chemprop/chemprop/pull/1041

Other small bug fixes
* Convert v1 models trained on GPU by KnathanM in https://github.com/chemprop/chemprop/pull/978
* Fix `hpopting` Notebook and CLI for Windows by JacksonBurns in https://github.com/chemprop/chemprop/pull/1034
* Update multiclass data to be compatible with rdkit 2024.09.1 by jonwzheng in https://github.com/chemprop/chemprop/pull/1037
* Define `task_weights` if it is `None` in `MulticlassClassificationFFN` by shihchengli in https://github.com/chemprop/chemprop/pull/988
* change installed torch version on windows actions again by KnathanM in https://github.com/chemprop/chemprop/pull/1016
* Update batch norm freezing to freeze running stats by joelnkn in https://github.com/chemprop/chemprop/pull/952
* Pass `map_location` through `load_submodules()` to `torch.load()` by shihchengli in https://github.com/chemprop/chemprop/pull/1029
* fix no-header-rows in predict command error by sunhwan in https://github.com/chemprop/chemprop/pull/1001

New Contributors
* sunhwan made their first contribution in https://github.com/chemprop/chemprop/pull/1001
* twinbrian made his first contribution in https://github.com/chemprop/chemprop/pull/1047

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.0.4...v2.0.5

2.0.4

Enhancements and New Features
This release introduces several enhancements and new features to Chemprop. A notable addition is a new notebook demonstrating Monte Carlo Tree Search for model interpretability (see [here](https://github.com/chemprop/chemprop/blob/main/examples/interpreting_monte_carlo_tree_search.ipynb)). Enhancements have been made to the output transformation and prediction saving mechanisms for `MveFFN` and `EvidentialFFN`. Additionally, users can now perform predictions on CPU even if the models were trained on GPU. Users are now also warned when not using the TensorBoard logger, helping them to be aware of available logging tools for better monitoring.

Bug Fixes
Several bugs have been fixed in this release, including issues related to Matthews Correlation Coefficient (MCC) metrics and loss calculations, and the behavior of the CGR featurizer when the bond features matrix is empty. The `task_weights` parameter has been standardized across all loss functions and moved to the correct device for MCC metrics, preventing device mismatch errors.

What's Changed
* Standardize `task_weights` in `LossFunction` across all loss functions by shihchengli in https://github.com/chemprop/chemprop/pull/941
* Improve output transformation and prediction saving for `MveFFN` and `EvidentialFFN` by shihchengli in https://github.com/chemprop/chemprop/pull/943
* Enable CPU prediction for GPU-trained models by snaeppi in https://github.com/chemprop/chemprop/pull/950
* Fix Issues in MCC Metrics and Loss Calculations by shihchengli in https://github.com/chemprop/chemprop/pull/942
* Fix docs building by pinning sphinx-argparse by jonwzheng in https://github.com/chemprop/chemprop/pull/964
* Add Monte Carlo Tree search notebook for interpretability by hwpang in https://github.com/chemprop/chemprop/pull/924
* Fix CGR featurizer behavior when bond features matrix is empty by jonwzheng in https://github.com/chemprop/chemprop/pull/958
* Fix Failing CI for `torch==2.4.0` on Windows `ray[tune]` Tests by JacksonBurns in https://github.com/chemprop/chemprop/pull/971
* warn users when not using tensorboard logger by JacksonBurns in https://github.com/chemprop/chemprop/pull/967
* Bug: Move `task_weights` to 'device' for MCC metrics by YoochanMyung in https://github.com/chemprop/chemprop/pull/973

New Contributors
* snaeppi made their first contribution in https://github.com/chemprop/chemprop/pull/950
* YoochanMyung made their first contribution in https://github.com/chemprop/chemprop/pull/973

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.0.3...v2.0.4

2.0.3

Notable changes
The `mfs` argument of `MoleculeDatapoint` was removed in 876. This argument accepted functions which generated molecular features to use as extra datapoint descriptors. When using chemprop in a notebook, users should first manually generate their molecule features and pass them into the datapoints using `x_d` which stands for (extra) datapoint descriptors. This is demonstrated in the `extra_features_descriptors.ipynb` notebook under examples. CLI users will see no change as the CLI will still automatically calculate molecule features using user specified featurizers. The `--features-generators` flag has been deprecated though in favor of the more descriptive `--molecule-featurizers`. Available molecule features can be found in the help text generated by `chemprop train -h`.

The default aggregation was changed to norm in 946. This was meant to be change in version 2.0.0, but got missed. Norm aggregation was used in all the benchmarking of version 1 as it performs better than mean aggregation when predicting properties that are extensive in the number of atoms.

More documentation for the CLI `hpopt` and `fingerprint` commands have been added and can be viewed [here](https://chemprop.readthedocs.io/en/main/tutorial/cli/hpopt.html) and [here](https://chemprop.readthedocs.io/en/main/tutorial/cli/fingerprint.html).

The individual predictions of an ensemble of models are now automatically averaged and the individual predictions are saved in a separate file. 919

What's Changed
* Change the installed numpy version in pyproject by shihchengli in https://github.com/chemprop/chemprop/pull/922
* Explicitly double save scalers/criterion by KnathanM in https://github.com/chemprop/chemprop/pull/898
* Add `--show-individual-scores` CLI flag by shihchengli in https://github.com/chemprop/chemprop/pull/920
* Set Ray Train's trainer resources to 0 by hwpang in https://github.com/chemprop/chemprop/pull/928
* Save individual and average predictions into different files by shihchengli in https://github.com/chemprop/chemprop/pull/919
* Add CLI pages for hpopt and fingerprint by jonwzheng in https://github.com/chemprop/chemprop/pull/914
* Make fingerprint CLI consistent with predict CLI by hwpang in https://github.com/chemprop/chemprop/pull/927
* Fix issue related to target column for fingerprint by hwpang in https://github.com/chemprop/chemprop/pull/939
* build molecule featurizer in parsing by KnathanM in https://github.com/chemprop/chemprop/pull/875
* Remove featurizing from datapoint by KnathanM in https://github.com/chemprop/chemprop/pull/876
* change aggregation default to norm by KnathanM in https://github.com/chemprop/chemprop/pull/946
* Use mol.GetBonds() instead of for loop by KnathanM in https://github.com/chemprop/chemprop/pull/931

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v2.0.2...v2.0.3

Page 1 of 5

Releases

Has known vulnerabilities

Chemprop

Page 1 of 5

2.1.2

2.1.1

2.1.0

2.0.5

2.0.4

2.0.3

Page 1 of 5

Links

Releases