- single-cell RNA-seq: add built-in support for 10x_v2.
- Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
- Ignore .Renviron when running Rscript to head-off PATH conflicts.
- Support SRR ids to download samples with bcbio_prepare_samples script.
- tumor-only prioritization: do not apply LowPriority filter by default, instead
annotate with external databases. Use `tumoronly_germline_filter` to re-enable
previous behavior.
- UMIs: apply default filtering based on de-duplicated read depth. Uses
`--min-reads 2` with raw de-duplicated coverage of 800 or more or `--min-reads 1`
otherwise. Allows error correction with UMIs for higher depth samples.
- gemini: databases no longer created by default. Use `tools_on: [gemini]` or
`tools_on: [gemini_orig]` to create a database. We now use a reduced database
for build 37 to match build 38 and make this forward compatible with CWL.
- vcfanno: run gemini and somatic annotations by default, producing annotated
VCFs with external information.
- alignment preparation: support a list of split files from multiple sequencing
lanes, merging into a single fastq
- variant: support octopus variant caller for germline and somatic samples.
- peddy: fix bug where not all files uploaded on first pipeline run
- peddy: For somatic analyses use separate germline calls for tumor/normal, if
available, or extracted germline calls from supported callers, instead of
somatic variants.
- GATK: support ploidy specification during joint calling.
- GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4
recommendations. Thanks to Severine Catreux.
- GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
- variant calling: support bcftools 1.9 which is more strict about duplicated
key names in INFO and FORMAT.
- seq2c: Upload global calls, coverage and read_mapping files to project directory.
- RNA-seq variant calling: Apply annotations after joint calling for GATK to
avoid import errors with GenomicsDB. Thanks to Komal Rathi.
- CWL: add `--cwl` target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm.
- CWL: use standard null instead of string "null" for representing None values.
- CWL: support for heterogeneity and structural variant callers that make
use of variant inputs.
- CWL: support ensemble calling for combining multiple variant callers.
- ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
- RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE
for quality control purposes. This is called `tagcounts-dupes.mtx` in the
final directory.
- single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell
pipeline.
- single-cell RNA-seq: only create one index per specified genome instead of per
sample
- fgbio: back compatibility for older quality setting `--min-consensus-base-quality`
- RNA-seq: fix for `fusion_caller` getting interpreted as a path, leading to
memoization/upload issues.
- RNA-seq: memoize rRNA quality calculations, speeding up reruns.
- RNA-seq: prefix `description` with an X if it starts with a number, for R
compatibility.
Thanks to Avinash Reddy and Dan Stetson at AstraZeneca.
- single-cell RNA-seq: respect `--positional` flag with the new tag counting. Thanks to
Babak Alaei at AstraZeneca.
- RNA-seq: turn on `--seqBias` flag by default for Salmon as early-version overfitting
issues have been fixed.
- RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
- RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from
all organisms processed.