Snp-pipeline

Latest version: v2.2.1

Safety actively analyzes 682416 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

0.7.0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Added a new script to the pipeline: ``snp_filter.py`` removes snps from the ends of contigs and
from regions where the snp density is abnormally high. This is an important change to the
pipeline with additional processing and new output files. See :ref:`snp-filtering-label`.
* NOTE: You cannot re-use an old configuration file when running SNP Pipeline version 0.7.0. You
must create a new configuration file. See :ref:`configuration-label`.
* Fixed compatibility with bcftools 1.2 and higher.
* Updated the result files in the included data sets with the results obtained using bcftools v1.3.1
and bowtie2 v2.2.9. Note: upgrading from bowtie 2.2.2 to 2.2.9 did not change the snp matrix
on any of the included datasets.

0.6.1

~~~~~~~~~~~~~~~~~~

* Fixed compatibility with SAMtools 1.3.
* Changed the expected results data sets to match the results obtained using SAMtools
version 1.3.1. Starting with SAMtools version 1.0, the samtools mpileup command implemented
a feature to avoid double counting the read depth when the two ends of a paired-end read
overlap. If you use this feature of SAMtools, the pileup depth will be noticably reduced.
You can still count the overlapping read sections twice by using SAMtools v0.1.19 or by using
a configuration file specifying the ``-x`` option in ``SamtoolsMpileup_ExtraParams``.
* Removed the obsolete ``reads.snp.pileup`` files from the included results data sets.

0.6.0

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Fixed compatibility with the newly released PyVCF 0.6.8 package.

**Other Changes:**

* A new configuration parameter, ``MaxSnps``, controls the maximum number of snps
allowed for each sample. Samples with excessive snps exceeding this limit are excluded
from the snp list and snp matrix.
See :ref:`excessive-snps-label`.
* A new column in the metrics.tsv file, ``Excluded_Sample``, indicates when a sample has been
excluded from the snp matrix. This column is normally blank.
* Added a new script to the pipeline: ``calculate_snp_distances.py`` computes the SNP distances between
all pairs of samples. The SNP distances are written to the output files ``snp_distance_pairwise.tsv``
and ``snp_distance_matrix.tsv``.
* Changed Sun Grid Engine execution to use array-slot dependency where possible, resulting
in less idle time waiting for job steps to complete.

0.5.2

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* An empty snplist.txt file should not cause errors when creating the referenceSNP.fasta.
* An empty snplist.txt file should not preclude re-running subsequent steps of the pipeline.
* When configured to ignore single-sample errors, a missing var.flt.vcf file should not
preclude rebuilding the snplist.txt file during a pipeline re-run.
* The metrics file did not properly capture the total number of snps per sample. See below for the details.

**Other Changes:**

* Capture separate metrics counting phase 1 snps (varscan) and phase 2 snps (consensus). Previously, the
metrics only included phase 1 snps. This changes the contents of both the ``metrics`` and ``metrics.tsv``
files. The metrics file now contains a new tag ``phase1Snps``. The old tag ``snps`` now correctly counts
the total number of snps. The metrics.tsv file now has separate column headers for phase 1 snps and
phase 2 snps. Any code that parses those files may need modifications to work properly with v0.5.2.
* Added the ``Average Insert Size`` metric.
* The metrics.tsv column headings now contain underscores instead of spaces for better interoperability
with some downstream analysis tools. Column headings with spaces can be generated by specifing the
combineSampleMetrics.sh ``-s`` option in the configuration file.
* Remove the dependence on the snp matrix when collecting sample metrics.
* Improve the speed of metrics calculation when rerunning the pipeline. Reuse the previously computed metrics
when recalculation would be slow.

0.5.1

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Do not shutdown the pipeline when the generated snplist is empty when there are no snps.
* Do not attempt to merge VCF files when there are fewer than two VCF files to merge.

**Other Changes:**

* Added the ``vcfFailedSnpGt`` option to the call_consensus.py script to control how the VCF file GT data
element is emitted when the snp is failed because of depth, allele frequency, or some other filter. If
not specified, the GT element will contain a dot. Prior to this release, the behavior was to emit the
ALT allele index. The old behavior can be retained by setting ``--vcfFailedSnpGt 1``
* Changed the setup to require PyVCF version 0.6.7 or higher. It will automatically upgrade if necessary.
* Added error checking after running SamTools and VarScan to detect missing, empty, or erroneous output files.

0.5.0

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Changed VCF file generator to not emit multiple alleles when the reference base is lowercase.

**Other Changes:**

* Trap errors, shutdown the pipeline, and prevent execution of subsequent steps when earlier processing
steps fail. A summary of errors is written to the ``error.log`` file.
See :ref:`error-handling-label`.
* Check for the necessary software tools (bowtie, samtools, etc.) on the path at the start of each
pipeline run.
* Check for missing or empty input files at the start of each processing step.
* Added two new parameters, ``GridEngine_QsubExtraParams`` and ``Torque_QsubExtraParams``, to the
configuration file to pass options to qsub when running the SNP Pipeline on an HPC computing cluster.
Among other things, you can control which queue the snp-pipeline will use when executing on an HPC
with multiple queues. See :ref:`configuration-label`.
* Removed the "job." prefix to shorten job names when running on an HPC.
* Changed the vcf file generator to emit reference bases in uppercase. Added the ``vcfPreserveRefCase``
flag to the call_consensus.py script to cause the vcf file generator to emit each reference base in
uppercase/lowercase as it appears in the original reference sequence file. If not specified, the
reference bases are emitted in uppercase. Prior to this release, the behavior was to always preserve the
original case.
* Added support for Python 3.3, 3.4, 3.5.
* Implemented a regression test suite for the bash shell scripts, using the shUnit2 package.

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.