Oncodrive3d

Latest version: v1.0.5

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

1.0.5

[Oncodrive3D](https://github.com/bbglab/oncodrive3d) is a fast and accurate 3D-clustering algorithm for driver gene discovery.

Key Updates and Features

This release addresses bug fixes in the build annotations and plotting modules, introduces enhancements to association plots, updates documentation, and includes general code cleanups.

Bug Fixes
- Fixed bug in the plotting module: [1](https://github.com/bbglab/oncodrive3d/commit/9d7e24dd09007891537f627ff59eb09e5e495e4c#diff-b86d3eaa758475db5641acbefe4fabe5b91f7d15066baeff1f920960884bc0e9) [2](https://github.com/bbglab/oncodrive3d/commit/ec4530c5afdffc5df4efe5d4a0209d3c329d9477#diff-839d6a342d33d32af3bd72796438a58810354dab73ac376cbb0388890797ead2)
- Fixed bug in build annotation [1](https://github.com/bbglab/oncodrive3d/commit/ce29d7983af591533ff8228a121f29aab264d29f#diff-f23e27ea6f58f7179ecee98b7f4829b9047d2e3d9e2d8e0737b84300602471b0) [2](https://github.com/bbglab/oncodrive3d/commit/c8329d9178829db433c2e2c90fcf2d6c76dd48cd#diff-524958f317b10c174945239c72e9a57b7ae79067649b5cdd90d33821bb123e93)

Features in associations plots
- Add FDR to logistic regression analysis for association between clusters and annotations [1](https://github.com/bbglab/oncodrive3d/commit/7ca36fff59394a63466d3d084912f1b716549638#diff-c4c324d443977ebacc39c903e2b7e3e4d30f95f9da67fb76469053bbada71549)
- Added associations plots to nextflow [1](https://github.com/bbglab/oncodrive3d/commit/1320359cfef8e5d0e3c2b64ac27323b057b91859#diff-380c6e809f12929d4d812ca47b070779a1983987f213f81cbadd1760619b7cdb)
- Removed comparative plots [1](https://github.com/bbglab/oncodrive3d/commit/2c3f0fcd6be477ae6b7da6ade3c43c7741d3debc#diff-c4c324d443977ebacc39c903e2b7e3e4d30f95f9da67fb76469053bbada71549)

Others
- Documentation update
- Linting

1.0.4

Second release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.

Key Updates and Features

This release mainly update the `README` with important information and fix a bug in the `oncodrive3d build-datasets` step.

Documentation Updates
- General improved documentation for clarity and usability.
- Added steps to fulfill software requirements, addressing installation failures on older machines lacking updated C libraries.
- Provided detailed information on input and output data formats, including:
- How to obtain the required input files.
- In-depth descriptions of the main outputs, including gene-level and residue-level clustering results.

Bug Fixes and Refactoring
- Fixed bug in `scripts/datasets/build_datasets.py` and `scripts/datasets/seq_for_mut_prob.py`:
- Disabled downloading and integrating MANE structures if `--mane` flag is not enabled.
- Removed usage of files related to the MANE downloads when computing the `seq_for_mut_prob.py` for a non-MANE Human proteome.
- Updated `scripts/datasets/utils.py` to increase the timeout for `sock_read` in PyPdl, preventing errors during the download of AlphaFold structures.
- Refactored `scripts/main.py` by moving the import of specific modules into their corresponding functions for better modularity and efficiency.

1.0.3

First release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.

Key Updates and Features

Packaging and Linting
- Added Python package build using `uv`.
- Published the package to `PyPI`, enabling installation via `pip install oncodrive3d`.
- Updated the `Dockerfile`.
- Applied code linting to improve code quality and maintainability.
- Added `LICENCE`

NextFlow Pipeline Updates
- Restructured the pipeline according to best practices for enhanced performance and maintainability and moved to `oncodrive3d_pipeline/.`

Documentation Updates
- Updated the `README` file:
- Added instructions for installation.
- Added instructions for running the provided NextFlow pipeline.

Bug Fixes and Refactoring and Others
- Removed preprocessing scripts in `build/preprocessing`.
- Updated URLs in `scripts/datasets/seq_for_mut_prob.py` and `scripts/plotting/pfam.py` to use the January 2024 Ensembl archive.
- Changed output column from `Cluster` to `Clump` in the residue-level output (`<cohort>.3d_clustering_pos.csv`).
- Changed `oncodrive3d run` input argument from `input_maf_path` to input_path in `scripts/main.py.`
- Refactored `scripts/datasets/utils.py` to improve download functionality and logging.

1.0.2

This is the second pre-release of Oncodrive3D, a fast and accurate 3D-clustering algorithm for driver gene discovery. It identifies mutation-enriched volumes by analyzing missense somatic mutations, leveraging AlphaFold's structural predictions to define residue contacts and mutation profiles to simulate neutral mutagenesis. The tool uses rank-based statistics and can process mutations from duplex sequencing studies, enabling the analysis of both cancer and normal tissue datasets across potentially any organism.

Key Updates and Features

New Modules for Annotation and Plotting:
- Introduced a comprehensive plotting module, including summary plots, gene plots, comparative plots, association plots, and ChimeraX plots.

Nextflow Pipeline:
- Added a minimal Nextflow pipeline to perform 3D clustering analysis across multiple cohorts and generate all relevant plots.

MANE Transcripts Support:
- Built datasets prioritizing MANE AF-predicted structures.
- Tracked transcript IDs from input data, including mismatch, match, or missing status compared to Oncodrive3D datasets.

Mutation Filtering:
- Filtered mutations with wild-type (WT) structure-AA mismatches and genes exceeding a threshold ratio of mapping issues.
- Added an option to disable WT AA mismatch filtering, particularly useful for mouse data where VEP and Uniprot isoform inconsistencies occur.

Direct VEP Output Support:
- Enabled direct VEP output processing, allowing filtering of transcripts based on Oncodrive3D-built datasets.

Enhanced Outputs:
- Included processed input mutations (`<cohort>.mutations.processed.tsv`), missense mutation probabilities (`<cohort>.miss_prob.processed.tsv`), and Oncodrive3D sequence dataframes (`<cohort>.seq_df.processed.tsv`).

Mouse Data Support:
- Fully enabled and tested processing of mouse data (mm39) across all steps, including dataset building, annotations, and plotting.

Bug Fixes and Improvements:
- Resolved bug affecting the identification of the most significant volume per gene.
- Changed sorting of position-level results from rank-based (Gene, Rank) to significance-based (Gene, p-value, Score).
- Refactored `main.py`, offloading unnecessary code to module-specific scripts for better organization.

Example usage

To run the examples provided, the `<input_path>` directory should be organized as follows:

<input_path>/
├── vep/
│ ├── <cohort_1>.vep.tsv.gz
│ └── <cohort_2>.vep.tsv.gz
├── mut_profile/
│ ├── <cohort_1>.sig.json
│ ├── <cohort_2>.sig.json

`vep/`: Contains the [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html) output files for each cohort, compressed as .tsv.gz.
`mut_profile/`: Contains the [Bgsignature](https://pypi.org/project/bgsignature/) output files (mutation profile in trinucleotide context) for each cohort, saved as .sig.json.

Human MANE
- `build_datasets -o <datasets_path> --mane`
- `build_annotations -o <annotations_path> -d <datasets_path>`
- `nextflow run main.nf --indir <input_path> --outdir <output_path> --data_dir <datasets_path> --annotations_dir <annotations_path> --vep_input true --verbose true --plot true --chimerax_plot true --mane true --seed 64 -profile container`

Mouse
- `build_datasets -o <datasets_path> --organism mouse`
- `build_annotations -o <annotations_path> -d <datasets_path> --organism mouse`
- `nextflow run main.nf --indir <input_path> --outdir <output_path> --data_dir <datasets_path> --annotations_dir <annotations_path> --ignore_mapping_issues true --plot true --chimerax_plot true --vep_input true -profile container`

1.0.1

This is the first pre-release of Oncodrive3D, a fast and accurate novel 3D-clustering algorithm for driver genes discovery. This approach involves analysing patterns of observed missense somatic mutations (in cancer or normal tissue) to identify volumes that exhibit a higher-than-expected frequency of mutations than what is typically observed under neutral mutagenesis. Oncodrive3D leverages AlphaFold's structure predictions and Predicted Aligned Error (PAE) to construct contact probability maps. Moreover, if provided, it uses the mutation profile of the cohort to simulate neutral mutagenesis while employing rank-based statistics to determine empirical p-values for the volumes of each mutated residue. Also, It can process the mutation profile and sequencing depth information. If provided as a mutability file, this allows the tool to process mutations obtained from duplex sequencing studies, which are commonly used in normal tissue sequencing at the time of this release.

Input

- **input.maf** (`required`): Mutation Annotation Format (MAF) file annotated with consequences (e.g., by using [Ensembl Variant Effect Predictor (VEP)](https://www.ensembl.org/info/docs/tools/vep/index.html)).

- **mut_profile.json** (`optional`): Dictionary including the normalized frequencies of mutations (*values*) in every possible trinucleotide context (*keys*), such as 'ACA>A', 'ACC>A', and so on.

- **mut_config.json** (`optional`): Dictionary including the path and parsing information for the mutability file, which includes information about mutation profile integrated with sequencing depth.

Output

- **cohort_filename.3d_clustering_genes.csv**: This is a Comma-Separated Values (CSV) file containing the results of the analysis at the gene level.

- **cohort_filename.3d_clustering_pos.csv**: This is a Comma-Separated Values (CSV) file containing the results of the analysis at the level of mutated positions.

Releases

Has known vulnerabilities