Chemprop

Latest version: v2.1.1

Safety actively analyzes 701442 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 5

1.6.0

Major New Features
* Atomic/bond targets prediction by shihchengli in https://github.com/chemprop/chemprop/pull/280

What's Changed
* Replace multiclass mcc with 1-mcc for loss by cjmcgill in https://github.com/chemprop/chemprop/pull/332
* Add chemprop logo by shihchengli in https://github.com/chemprop/chemprop/pull/339
* Add CodeQL workflow for GitHub code scanning by lgtm-com in https://github.com/chemprop/chemprop/pull/344
* Add to the description of evidential regularization by cjmcgill in https://github.com/chemprop/chemprop/pull/353
* Remove deprecated numpy float types by cjmcgill in https://github.com/chemprop/chemprop/pull/357
* Correct a bug in ENCE uncertainty evaluation by cjmcgill in https://github.com/chemprop/chemprop/pull/360
* Hyperopt Parallel Race Conditions and Manual Trial Load by cjmcgill in https://github.com/chemprop/chemprop/pull/307
* Simplified install with PyPI `rdkit` and git install in `setup.py` by JacksonBurns in https://github.com/chemprop/chemprop/pull/364
* Allow providing both loaded features and a features generator by shihchengli in https://github.com/chemprop/chemprop/pull/318
* For any multiclass task, `make_predictions` fails if option --individual_ensemble_predictions is on. by piotr-semenov in https://github.com/chemprop/chemprop/pull/354
* Save loaded molecular features into .npy files by shihchengli in https://github.com/chemprop/chemprop/pull/337
* Ignore invalid atom-mapped SMILES by shihchengli in https://github.com/chemprop/chemprop/pull/367
* Molecule fingerprinting with invalid SMILES in list by shihchengli in https://github.com/chemprop/chemprop/pull/351
* change calibration_features_path from str to List[str] by ceroth in https://github.com/chemprop/chemprop/pull/358
* Change logo style by shihchengli in https://github.com/chemprop/chemprop/pull/369
* Clamp evidential 'v' parameter by kevingreenman in https://github.com/chemprop/chemprop/pull/371
* fix colab demo by kevingreenman in https://github.com/chemprop/chemprop/pull/368
* Avoid OverflowError when setting field size to sys.maxsize by shihchengli in https://github.com/chemprop/chemprop/pull/373
* Set atom and bond constraints when loading model by shihchengli in https://github.com/chemprop/chemprop/pull/374
* Readme updates by kevingreenman in https://github.com/chemprop/chemprop/pull/385
* Remove atom map numbers for scaffold splits by shihchengli in https://github.com/chemprop/chemprop/pull/383
* update bug report template - ask for full stack trace by kevingreenman in https://github.com/chemprop/chemprop/pull/401
* Fix t-SNE script by kevingreenman in https://github.com/chemprop/chemprop/pull/403
* Fixing skipped lines in csv writing when using a windows computer by cjmcgill in https://github.com/chemprop/chemprop/pull/406

**Full Changelog**: https://github.com/chemprop/chemprop/compare/v1.5.2...v1.6.0

1.5.2

Features

Flexible hyperparameter search space
The parameters to be included in hyperparameter optimization can now be selected using the argument `--search_parameter_kewords {list-of-keywords}`. The parameters supported are: activation, aggregation, aggregation_norm, batch_size, depth, dropout, ffn_hidden_size, ffn_num_layers, final_lr, hidden_size, init_lr, max_lr, warmup_epochs. Some special kewords are also included for groups of keywords or different search behavior: basic, learning_rate, all, linked_hidden_size.
PR 299

Missing targets in uncertainty calibration datasets
Added capabilities to the uncertainty calibration and evaluation methods to allow them to handle missing target values in multitask jobs. This capability was already included in the normal training of models, now implemented in uncertainty calibration and evaluation.
PR 295
Issue 292

Multitask evaluation for tasks of different magnitudes
When evaluation metrics tend to scale with the magnitude of a task (e.g., rmse), averaging metrics between tasks has been replaced with a geometric mean function. This makes the average metric in multitask regression jobs be less dominated by large magnitude targets. This was previously an issue for hyperparameter optimization and the evaluation of optimal epoch during model training, though the calculation of loss for gradient descent is on scaled targets and was already not scale dependent.
PR 290

Empty test set allowed
An empty test split can now be used during training. This was previously possible only using the `cv-no-test` split method, but now it is available more widely when specifying split sizes, for example with `--split_sizes 0.8 0.2 0`.
PR 284, 260 related
Issue 279

Updates to conda environment and docker file
Conda environment building will now prefer to use the pytorch channel over the conda-forge channel. The Dockerfile has been updated to use micromamba, allowing for faster environment solves than conda and removing a potential licensing issue.
PR 276

Bug Fixes

Fix MCC loss for multiclass jobs
Corrected a calculation problem in the loss function that was returning infinite loss inappropriately. Also adopted the convention of returning loss of zero when infinite loss is returned, as often happens in very unbalanced datasets. Added appropriate unit testing.
PR 309
Issue 306

Correct code error in ence uncertainty evaluation
Corrects an error in the ence uncertainty evaluation method that made that method unusable. Bug was introduced during PR 305.
PR 302
Issue 301

Fixed link to MoleculeNet website
Corrected the link to the MoleculeNet benchmark dataset website in the readme, following MoleculeNet migrating to a new site location.
PR 296

Multitarget uncertainty calibration mve weighting method
Previously, this method only worked for single task jobs, now has been extended to work for multitask models as well.
PR 291

Remove unused verion.py file
Version tracking in Chemprop no longer uses the __version__.py file and it was removed.
PR 283

Multiclass argument typo in readme
Corrected a typo where the number of classes used in multiclass regression should have been indicated as `--multiclass_num_classes`.
PR 281

Repair individual ensemble predictions
Refactoring of prediction file during the addition of uncertainty functions disabled the option to return the individual predictions of each member of an ensemble of models. Option is now available again.
PR 274

1.5.1

Bugfix
Inconsistent Path For Uncertainty Evaluation
Fixed a bug in uncertainty evaluation where the uncertainty evaluator was using the path name originally used to train a checkpoint. This made the uncertainty evaluator only work in the case that the test data and training data used in initial model training had the same path.

1.5.0

Features

Uncertainty Tools
Tools added for uncertainty quantification, calibration, and evaluation as part of the chemprop predict function. Uncertainty predictions are saved as part of the predictions file. Uncertainty functions and outputs are triggered using the arguments `--uncertainty_method {method}`.

Uncertainty outputs can be calibrated using an outside dataset (evaluation set from training is often suitable) in order to have better uncertainty estimates on new predictions. Can be activated using `--calibration_method {method}` and `--calibration_path {path-to-csv}`. For the regression dataset type, a calibrated output can provide either a standard deviation or one-sided interval bound, as set with the options `--regression_calibrator_metric {stdev-or-interval}` and `--calibration_interval_percentile {int}`.

If the data file containing smiles for the test path also contains target values, the uncertainty performance can be evaluated using various metrics, activated with the option `--evaluation_methods {list-of-methods}`.

Internally, this PR creates several classes for carrying out prediction tasks: UncertaintyEstimator, UncertaintyPredictor, UncertaintyCalibrator, UncertaintyEvaluator. Loss functions have been added that have auxiliary uncertainty outputs, `mve` and `evidential` for regression.
PR 267
PR 269

Reaction-Solvent Option
Gives the option to train a chemprop model using one reaction and one molecule for each datapoint. Active when used with the option `--reaction_solvent`. Options for making the solvent mpnn use different parameters than that for the reaction are possible using `--bias_solvent`, `--hidden_size_solvent {int}`, and `--depth_solvent {int}`.
PR 246

Multimolecule Fingerprinting
Added some new changes for fingerprint functions with multiple molecules. Models trained with a "shared-mpn" between two molecules can return a MPN fingerprint with only one molecule provided. Also, when multiple molecule models are used for MPN fingerprint generation, the output will indicate which molecule each element belongs to.
PR 242
Issue 236

Colab Notebook Examples
Created a Jupyter notebook that runs examples of Chemprop jobs, specifically as the functions can be used in python. Good resource for new users, demonstrations, or tutorials. Linked to Google Colab so that it can be run remotely, not requiring any local install of Chemprop.
PR 239
PR 273

Loss Function Options
Previously, loss functions were selected automatically based on the dataset type being used in model training. Now the loss function can be selected with `--loss_function {function}`. Some new specialty loss functions have been added with this capability.
* Matthews Correlation Coefficient (`mcc`) is a loss function for classification and multiclass that considers True Positives, True Negatives, False Positives, False Negatives separately in the loss function, avoiding domination by one class and making it well suited to unbalanced training sets.
* Bounded Mean Squared Error (`bounded_mse`) is a regression loss function that allows for training targets expressed as inequalities, e.g. ">5.0". Intended for use with experimental data with delimited ranges.
* Mean Variance Estimation (`mve`) and `evidential` loss are regression loss functions that maximize the likelihood of the target on an estimated uncertainty distribution. When used as loss functions, the outputs of these functions can be used in uncertainty estimation.
Appropriate metrics have been added along with these loss functions.
PR 238
PR 267

Development Environment

GitHub Addons
Added a `CONTRIBUTING.md` file with guidelines for how users can contribute to Chemprop. New templates are now available for issue submission that distinguish between different issue types: bug report, feature request, and questions. New templates also suggested for PRs. Templates stored in the `.github` directory.
PR 241

Unit Testing
Part of an ongoing effort to include a more complete set of automated tests for Chemprop. Unit tests added for data utils, uncertainty-related loss functions, and the uncertainty evaluation metrics.
PR 232
PR 267
PR 269

Flake8 Formatting
Ongoing effort to standardize the formatting of incoming code. New PRs now request/require the new code to be flake8 compliant in formatting. The utils module and files significantly associated with the new uncertainty function are flake8 compliant.
PR 241
PR 258
PR 267

Update Versioning
Changed the way that version numbers are stored and updated throughout the code.
PR 247

Remove Assertion Errors
Removed many of the assertion errors throughout Chemprop and replaced them with more easily interpretable error types and messages.
PR 257

Bug Fixes

Hyperopt Version Fix
Changed the way that random seeds are passed into hyperopt during hyperparameter optimization to avoid an error where hyperopt stopped supporting a previously supported way of passing numpy seeds.
PR 245
Issue 243
Issue 254
Issue 264

1.4.1

Features

Allow the inclusion of H atoms in message passing
Default model behavior is to treat H atoms implicity with their neighbors. With the previously existing argument `--explicit_h`, explicit H atoms included in the SMILES string would be considered during message passing. This PR adds a new argument `--adding_h`, which would make all H atoms treated explicitly during message-passing.
PR 225 and 227

Allow splitting by different key molecules in multi-molecule models
The data-splitting methods `scaffold_balanced` and `random_with_repeated_smiles` can only consider one molecule per datapoint in adhering to the constraints of which data must share splits with each other. This PR creates an argument `--split_key_molecule {int}`, which is used to select which molecule in multi-molecule datasets will be used for the splitting determination.
PR 230

Select split fractions when separate test data is provided
Previously, the split fractions for training/validation were hardcoded as 80/20 when test data was provided via `--separate_test_path`. Split fractions can now be specified in this case using `--split_sizes` as normal.
PR 230

Additional output options for make_predictions function
This change affects usage of `make_predictions` as a python function, rather than in the whole Chemprop workflow. When used as a python function, `make_predictions` would return the predictions for a set of SMILES, but would skip the invalid SMILES without indicating which ones were skipped. Now this function has two new option arguments: 1) `return_invalid_smiles` that includes invalid SMILES in the output but with "Invalid SMILES" as the prediction value and 2) `return_index_dict` that returns predictions of the model in a dictionary keyed to the original data indices.
PR 235

New utility functions for identifying invalid SMILES
New functions have been added to chemprop/data/utils.py to allow users to identify datapoints that have invalid SMILES. These functions are `get_invalid_smiles_from_file` and `get_invalid_smiles_from_list`.
PR 235

Bug Fixes

Simultaneous use of extra atom features and extra bond features
Bug prevented using extra atom features and extra bond features at the same time and has been resolved.
PR 215
Issue 213

Fixed install error with newer versions of pip
Newer versions of pip failed to install some some chemprop dependencies properly. These dependencies (flake8, pytest, parameterized) were moved to an installation as part of the conda environment rather than by pip. Also, environment build for testing was changed from conda to mamba for better install speed.
PR 215 and 216

Correction in tutorial file
Tutorial file changed to show the proper list of lists format for SMILES.
PR 218

Predicting for a multiclass model with an improper SMILES
When making a prediction for an improper SMILES in a multiclass model, an error would be triggered instead of returning a prediction of "Invalid SMILES". This has been corrected for this case and the parallel case of improper SMILES used with `--individual_ensemble_predictions`.
PR 229

Molecule fingerprints generated with extra atom features
Molecule fingerprints could not be predicted when extra atom features were provided as part of the model. This and the parallel issue with extra bond features have been addressed.
PR 234
Issue 233

1.4.0

Features

Spectra training
Introduces `spectra` as a new dataset type available for training, in which each target in a multitarget regression refers to a positive intensity value in one position of a spectrum. Training methods are consistent with https://github.com/gfm-collab/chemprop-IR. Default loss function is spectral information divergence (SID), but Wasserstein loss (earthmover distance) is also supported with `--metric wasserstein --alternative_loss_function wasserstein`.
PR 197

Preloading model in predictions
Refactored the `make_predictions` into smaller functions for better capability to use chemprop functions as a python library. Refactoring specficially designed to allow for the loading of a model using the function `chemprop.train.load_model` a single time and then using it for multiple instances of predictions by feeding that model as an argument to `chemprop.train.make_predictions`.
PR 200

Improved hyperparameter optimization
Added several new features to hyperparameter optimization, many related to hyperparameter checkpoints saved in the location specified by `--hyperopt_checkpoint_dir <dir_path>`. The new functionalities:
* Restarting failed hyperparameter optimization jobs by selecting the same checkpoint directory.
* Parallelizing multiple instances of hyperparameter optimization by setting a shared checkpoint directory among instances.
* Seeding hyperparameter optimizations with previously run jobs by indicating an old checkpoint directory and/or by specifying the save directories of relevant jobs trained with `train.py` using `-manual_trial_dirs <list-of-directories>`.
* Manually set the number of hyperparameter trials that use randomized parameters before directed TPE search begins using `--startup_random_iters <int, default=10>`.
PR 208

Return results from all ensemble models
When making predictions from an ensemble of models, returns the mean prediction but also the individual predictions from the individual models when `--individual_ensemble_predictions` is specified.
PR 190

Latent representations for ensembles and from FFN layers
Allows for the calculation of latent fingerprints from an ensemble of models by concatenating them together. Also allows for the return of either a latent representation from the MPNN output or from the next-to-last FFN layer using the argument `--fingerprint_type <MPN or last_FFN>`.
PR 193

Target imputation for sklearn multitask models
Sklearn multitask training cannot proceed with missing targets among the data, previously would have needed to be run as multiple singletask models. This PR introduces target imputation for missing data to allow multitask sklearn training even when some data is missing with the argument `--impute_mode <model/linear/median/mean/frequent>` indicating which method to use for imputation.
PR 210
Issue 211

Reaction balancing
Adds options in reaction training for how to handle situations where reactants and products are not balanced. The argument `--reaction_mode` now also has the options `reac_diff_balance`, `prod_diff_balance`, and `reac_prod_balance` (in addition to the current options `reac_diff`, `prod_diff`, and `reac_prod`). Also fixes an error where atomic numbers are incorrect when an atom is present in the products but not in the reactants.
PR 212
Issue 204

Bug Fixes

Interactions with git repos
Resolves a problem with TAP (typed-argument-parser) where running Chemprop from inside a different git repo would trigger an error related to the generation of a reproducibility hash. In this situation the reproducibility hash is not generated, but it logs the issue and does not stop Chemprop from running.
PR 195

Global features structure
Changes the way that global variables related to model construction and feature vector size are handled. Resolves a problem in pytest where these variables wouldn't reset between runs.
PR 206

Page 3 of 5

Releases

Has known vulnerabilities

Previous Next

Chemprop

Page 3 of 5

1.6.0

1.5.2

1.5.1

1.5.0

1.4.1

1.4.0

Page 3 of 5

Links

Releases