Bcbio-nextgen

Latest version: v1.1.5

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 8

1.1.5

- Fixes for Python3 incompatibilities on distributed IPython runs.
- Numerous smaller Python3 incompatibilities with strings/unicode and types.
Thanks to the community for reporting these.
- GATK HaplotypeCaller: correctly apply skipping of marked duplicates only
for amplicon runs. Thanks to Ben Liesfeld.
- Fix format detection for bzip2 fastq inputs.
- Support latest GATK4 MuTect2 (4.1.1.0) with changes to ploidy and reference
parameters.
- Support changes to GATK4 for VQSR --resource specification in 4.1.1.0. Thanks
to Timothee Cezard.
- Support latest bedtools (2.28.0) which expects SAM heads for bgzipped BED
inputs.

1.1.4

- Move to Python 3.6. A python2 environment in the install runs non python3
compatible programs. The codebase is still compatible with python 2.7 but
will only get run and tested on python 3 for future releases.
- RNA-seq: fix for race condition when creating the pizzly cache
- RNA-seq: Add Salmon to multiqc report.
- RNA-seq single-cell/DGE: Properly strip transcript versions from GENCODE GTFs.
- RNA-seq: Faster and more flexible rRNA biotype lookup.
- Move to R3.5.1, including updates to all CRAN and Bioconductor packages.
- tumor-only germline prioritization: provide more useful germline filtering
based on prioritization INFO tag (EPR) rather than filter field.
- Install: do not require fabric for tool and data installs, making full codebase
compatible with python 3.
- variant: Filter out variants with missing ALT alleles output by GATK4.
- GATK: enable specification of spark specific parameters with `gatk-spark`
resources.
- RNA-seq single-cell/DGE: added `demultiplexed` option. If set to True, treat the
data as if it has already been demultiplexed into cells/wells.
- Multiple orders of magnitude faster templating with thousands of input files.

1.1.3

- CNV: support background inputs for CNVkit, GATK4 CNV and seq2c. Allows
pre-computed panel of normals for tumor-only or single sample CNV calling.
- variant: avoid race condition on processing input BED files for variant
calling when no pre-specific variant_regions available.
- structural variation upload: avoid uploading multiple batched calls into
sample directories. For lumpy will now have a single output per batch in a
sample folder.
- install: respect pre-specified bioconda and conda-forge in condarc
configuration. Allows use of custom package mirrors.
- seq2c: move specialized pre-call calculation upstream to coverage estimation.
Allows use of seq2c in CWL runs.
- MultiQC upload: fix bug where results from parallel run not moved to final
directory.
- GATK4 CNV: fix for standardize VCF output, correcting number of columns.
- RNA-seq variation: fix for over-filtering variants near splice junctions with
STAR.
- Structural variant gene annotation: simplify and handle issues with
multidirectional comparisons. Handle issues with out of order start/end from CNVkit.
- Catch and report unicode characters in templating or YAML descriptions.

1.1.2

- VarDict low frequency somatic filters: generalize strand and mismatch based
filter based on cross-validation to avoid over filtering on high depth panels.
- strelka2 joint calling: switch to improved gvcfgenotyper approach for calling
from gVCFs.
- Heterogeneity: initial support for PureCN and TitanCNA heterogeneity analysis
including reporting on LOH in HLA for human samples. Work in progress validations:
https://github.com/bcbio/bcbio_validations/tree/master/TCGA-heterogeneity
- CNV: initial support for GATK4 CNV calling as alternative to CNVkit for
tumor normal analyses
- VarDict RNA-seq variant calling: avoid structural variants with recent vardict-java.
- RNA-seq variation: filter RNA-seq variants close to splice junctions,
supporting STAR and hisat2.
- RNA-seq variation: add snpEff effects to output variant calls. Thanks to Manasa Surakala.
- RNA-seq: gzip/bgzip FASTQ files in `work/fastq` instead of the original directory.
- use biobambam2 BAM to FASTQ conversion instead of Picard in all cases.
- Trimming: add built-in support for adapters from the SMARTer Universal Low Input RNA Kit
(truseq2) and the Illumina NEXTera DNA prep kit from NEB (nextera2).
- ChIP/ATAC-seq: allow skipping duplicate marking.
- joint calling: ensure correct upload to final directory when no annotations present
- Logging: fix logging in parallel runs with new joblib loky backend. Thanks to
Ben Liesfeld and Roland Ewald.

1.1.1

- single-cell RNA-seq: add built-in support for 10x_v2.
- Fix UMI support for small RNA. Compatible with Qiagen UMI small RNA protocol.
- Ignore .Renviron when running Rscript to head-off PATH conflicts.
- Support SRR ids to download samples with bcbio_prepare_samples script.
- tumor-only prioritization: do not apply LowPriority filter by default, instead
annotate with external databases. Use `tumoronly_germline_filter` to re-enable
previous behavior.
- UMIs: apply default filtering based on de-duplicated read depth. Uses
`--min-reads 2` with raw de-duplicated coverage of 800 or more or `--min-reads 1`
otherwise. Allows error correction with UMIs for higher depth samples.
- gemini: databases no longer created by default. Use `tools_on: [gemini]` or
`tools_on: [gemini_orig]` to create a database. We now use a reduced database
for build 37 to match build 38 and make this forward compatible with CWL.
- vcfanno: run gemini and somatic annotations by default, producing annotated
VCFs with external information.
- alignment preparation: support a list of split files from multiple sequencing
lanes, merging into a single fastq
- variant: support octopus variant caller for germline and somatic samples.
- peddy: fix bug where not all files uploaded on first pipeline run
- peddy: For somatic analyses use separate germline calls for tumor/normal, if
available, or extracted germline calls from supported callers, instead of
somatic variants.
- GATK: support ploidy specification during joint calling.
- GATK BQSR: bin qualities into static groups (10, 20, 30) to match GATK4
recommendations. Thanks to Severine Catreux.
- GATK: support 4.0.10.0 which does not use UCSC 2bit references for Spark tools
- variant calling: support bcftools 1.9 which is more strict about duplicated
key names in INFO and FORMAT.
- seq2c: Upload global calls, coverage and read_mapping files to project directory.
- RNA-seq variant calling: Apply annotations after joint calling for GATK to
avoid import errors with GenomicsDB. Thanks to Komal Rathi.
- CWL: add `--cwl` target to bcbio_nextgen.py upgrade to add and maintain bcbio-vm.
- CWL: use standard null instead of string "null" for representing None values.
- CWL: support for heterogeneity and structural variant callers that make
use of variant inputs.
- CWL: support ensemble calling for combining multiple variant callers.
- ensemble: remove no-ALT ref calls that contribute to incorrect ensemble outputs
- RNA-seq: output a matrix of un-deduped UMI counts when doing single-cell/DGE
for quality control purposes. This is called `tagcounts-dupes.mtx` in the
final directory.
- single-cell RNA-seq: allow pre-transformed FASTQ files as input to DGE/single-cell
pipeline.
- single-cell RNA-seq: only create one index per specified genome instead of per
sample
- fgbio: back compatibility for older quality setting `--min-consensus-base-quality`
- RNA-seq: fix for `fusion_caller` getting interpreted as a path, leading to
memoization/upload issues.
- RNA-seq: memoize rRNA quality calculations, speeding up reruns.
- RNA-seq: prefix `description` with an X if it starts with a number, for R
compatibility.
Thanks to Avinash Reddy and Dan Stetson at AstraZeneca.
- single-cell RNA-seq: respect `--positional` flag with the new tag counting. Thanks to
Babak Alaei at AstraZeneca.
- RNA-seq: turn on `--seqBias` flag by default for Salmon as early-version overfitting
issues have been fixed.
- RNA-seq: report insert size from Salmon fragment distribution, not samtools stats.
- RNA-seq: when processing explant samples, produce a combined tx2gene.csv file from
all organisms processed.

1.1.0

- Germline calls: rename outputs to `samplename-germline` to provide easier
to understand outputs in final directory.
- Add bcbioRNASeq object creation and automatic quality report generation
with `tools_on: [bcbiornaseq]`
- CWL: Support germline/somatic calling for tumor samples.
- CNVkit: improve whole genome runs. Better speed in normalize_sv_coverage
through parallelization and avoiding logging. Avoid memory errors in segmentation.
- UMI: upload prepared UMI bam file (pre-consensus) to final output directory
- Add support for bbmap as an aligner
- RNA-seq variant calling: parallelize GATK HaplotypeCaller over regions to
avoid memory and timeout issues.
- Support joint calling with GATK using pre-prepared gVCF inputs.
- RNA-seq variant calling: allow annotation of output variants with vcfanno
- Support hg38 builds with peddy QC
- QC: support VerifyBamID2 for contamination detection
- CWL: adjust defaults for align_split_size and nomap_split_targets to match
different parallelization and overhead for these runs
- CWL: support for Cromwell runner
- custom genomes: Unzip GTF file prior to installation.
- Avoid making variant_regions required during processing (by filling with
coverage) to differentiate targeted and non analyses downstream.
- Avoid attempts to download pre-installed S3 genomes, providing better
errors with missing genome installs.
- Trimming: add explicit `polyg` option for removing 3' G stretches in NovaSeq
and NextSeq data. Now defaults to no polyG trimming unless turned on.
- Chip-seq: Add RiP calculation for chip-seq data.
- DeepVariant and Strelka2 support for customized targeted/genome calling models
per region to handle heterogeneous inputs.
- STAR: enable passing custom options for alignment.
- Add `tools_off: [coverage_qc]` option to skip calculating coverage stats (samtools-stats and picard).
- Adding BAM file for each sample in small-RNAseq pipeline, samtools
and qualimap qc metrics to multiqc report.
- Allow arbitrary genomes for ChIP-seq. Thanks to evchambers for pointing out the issue.

Page 1 of 8

Releases

Has known vulnerabilities

Bcbio-nextgen

Page 1 of 8

1.1.5

1.1.4

1.1.3

1.1.2

1.1.1

1.1.0

Page 1 of 8

Links

Releases