Papyrus-scripts

Latest version: v2.1.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

2.0.0

New feature:
The `PapyrusDataset` class allows for object-oriented 'pandas-style' querying.

Changes
- `reader.read_papyrus`: raises an error when trying to load the Papyrus++ set with stereochemistry,
- `preprocess.keep_source`: argument `source` uses regex matching,
- `preprocess.keep_organism`: argument `organism` is now case insensitive when `generic_regex=False`
- `download.download_papyrus` now downloads also the README files

Additions:
- `preprocess.keep_not_match`: keep unmatched column values.
- `preprocess.keep_not_contains`: keep records whose specified column do not contain the specified value
- `preprocess.keep_dissimilar`: keep records whose molecules are not similar to the provided molecule
- `preprocess.keep_not_substructure`: keep records whose molecules are not substructures of the provided molecule

**Full Changelog**: https://github.com/OlivierBeq/Papyrus-scripts/compare/1.0.3...2.0.0

1.0.3

Bug fixes:
- ***keep_source*** now returns an empty dataframe for chunks in which the desired source does not appear

New features:
- ***qsar*** and ***pcm***'s **split_by** argument now supports **'custom-cluster'** to split training and test sets according to a custom assignment that is not directly specifying train/test (as is the case when its value is **'cutsom'**).

1.0.2

- Made download disclaimer and errors due to low disk space more evident
- `papyrus_scripts.utils.IO.process_data_version` <br/>now raises an exception stating <br/>Papyrus data not available (did you download it first?)

1.0.1

The Papyrus++ datasets contained duplicated data wrongly associated to multiple assay types (i.e. Ki, KD, EC50, IC50).

The datasets have been updated and links of this release and of the `db-links` branch have updated accordingly.

1.0

Version 1.0 of the Papyrus-scripts library.

Allows one to:
- download the Papyrus dataset
- convert it from/to XZ to/from GZIP
- match the data to structures of the Protein Data Bank
- create FPSubSim2 (extension of FPSim2) files for similarity and substructure searches
- filter the Papyrus data
- model it with QSAR and PCM models
- remove the data files

Releases

Has known vulnerabilities