- Use MultiQC (github.com/ewels/MultiQC) as main package to process all
QC metrics.
- New install procedure for data: `--datatarget` allows installation of sub-sets
of supplemental data for smaller installs for small RNA only analysis. Also
provides a consistent framework for installing larger data types.
- VEP data no longer installed by default. Requires `--datatarget vep`
- During install, `--toolplus` only used for third party tools like GATK and
MuTect and not data installation, which moved to `--datatarget`
- Provide `data_versions.csv` in output folder that has versions of reference
data used in the analysis.
- Use sample description for BAM read group IDs, instead of lane index. This
allows remixing of samples after processing without potential collisions. Thanks
to Neill Gibson.
- Use sample description for file names instead of lane/flowcall information.
Makes re-runs more stable when using template and files easier to interpret.
Back compatible with re-runs of old work directories.
- Finalize support for MuTect2 with validation against the DREAM synthetic 4
dataset (http://imgur.com/CLqJlNF). Thanks to Alessandro (apastore).
- Do not bgzip inputs when they are already gzipped and do not require
parallelization or format conversion. Thanks to Miika Ahdesmaki.
- Use new snpEff annotations (ANN) instead of older approach (EFF). The
new annotations are more interoperable and supported by GEMINI.
- Lazy import of matplotlib libraries to avoid slow startup times.
- Only apply ploidyfix to all female batches to remove Y chromosome. Avoids
confusion with file produced in other cases without any changes.
- Improvement to bcbio CWL integration: support parallel alignment and variant
calling.
- Support for Salmon and RapMap added.
- FastRNA-seq pipeline implemented that does nothing but run Salmon with no QC.
- Singlecell RNA-seq pipeline implemented that uses https://github.com/vals/umis
to handle the UMI and cellular barcode, aligns with RapMap and quantitates
by counting, scaling ambiguous reads by the number of transcripts they could have
come from.
- Migrate bowtie and bowtie2 to handle split input alignments, bgzipped inputs,
and produce sorted, de-duplicated BAM files. This allows use in additional
standard pipelines. Thanks to Luca Beltrame.
- Switch final upload directories for salmon and sailfish results to be of the
form samplename/salmon instead of samplename/salmon/samplename.