Somaticseq

Latest version: v3.7.4

Safety actively analyzes 638396 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 9

3.6.2

* Handle cases when the variant position is at the beginning or the end of a contig/chromosome, such that the 80-bp window spanning or adjacent the variant position could not be obtained to calculate sequence complexity.

3.6.1

* No change to somaticseq algorithm, i.e., classifiers are identical to v3.6.0.
* Users can invoke the `makeSomaticScripts.py` module/command to run a parallelized somatic mutation detection workflow starting from BAM files.
* Users can invoke `makeAlignmentScripts.py` to run a parallelized alignment workflow starting from .fastq/.fastq.gz files.

3.6.0

* Re-wrote the XGBoost routine to use the xgboost library in python (somaticseq/somatic_xgboost.py, which also requires pandas library). Also made it the default algorithm for SomaticSeq because xgboost in python is orders of magnitudes faster than AdaBoost in R. You can still use ada in R by invoking `-algo ada` in the command.
* Got around VarDict's latest output VCF file that are incompatible with bedtools by removing the incompatible lines (i.e., when ALT has \<DUP\>, \<DEL\>, \<INV\> but has no END field in the INFO column). An extra step (may remove later if it becomes unnecessary) was added to somaticseq/combine_callers.py.
* Finally remove legacy SomaticSeq.Wrapper.sh and ssSomaticSeq.Wrapper.sh scripts (replaced by somaticseq_parallel.py since v3.0.0).

3.5.1

* Fixed a minor bug when num_caller in somaticseq/somatic_vcf2tsv.py and somaticseq/single_sample_vcf2tsv.py did not reset properly when there are multiple variant calls in the same genomic position. As a result, some variant calls that should not be output into the .tsv (because num_caller=0) will be output into the .tsv file because num_caller was adding up counts from the previous variant call of the same genomic coordinate. However, the features are still reported correctly, so the classification results should stay the same.

3.5.0

* Replaced z-scores from scipy's ranksums with p-values from scipy's mannwhitneyu, mostly because the mannwhitneyu corrects for discrete values. **Thus, models built prior to this version is no longer compatible with it due to different features.**

3.4.2

* Modified the linguistic sequence complexity calculation to limit the substring to 20-bp. It decreases runtime with no sacrifice of accuracy.
* Fixed a bug where the indels nearest to a position was not calculated properly when there are additional insertions and soft-clipped bases in a read.

Page 2 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.