This release contains a new command `import-rna` to infer coarse-grained copy number from RNA expression data. (151)
Three new HMM-based segmentation methods are offered: 'hmm', 'hmm-germline', and 'hmm-tumor'. These should be considered experimental and used with caution; the implementations are likely change in the next release.
The option `--male-reference` in the commands `batch`, `reference`, `fix`, `call`, and `export` (at least) has been renamed to `--haploid-x-reference` everywhere to reduce user confusion. A shim is in place so `--male-reference` will continue to work.
Documentation, logging, and some error messages are improved.
Thanks to chapmanb, MajoroMask, and others for contributing to this release.
Dependencies
------------
- 'pandas' version 0.22 is supported.
- 'pysam' version 0.13.0 is supported.
- 'hmmlearn' version 0.2 is a run-time requirement to use the new HMM-based segmentation methods. The rest of CNVkit can be run without it. To ensure the right version is installed, install CNVkit with conda as usual, then install hmmlearn with pip within the CNVkit conda environment.
- Assume and require pip/setuptools for installation. (This is included with stock Python 2.7 and later.)
Scripts
-------
- New script "skg_convert.py" to convert between BED, GATK interval list, GFF, VCF, and tabular formats using the 'skgenome.tabio' sub-package, with options for simple post-processing.
- Removed the deprecated script refFlat2bed.py. (Use skg_convert.py instead.)
Commands
--------
`access`:
- Drop noncanonical, untargeted contigs/chromsomes by default. This affects analyses run from scratch with `batch`, too. (169, 299)
`segment`:
- Three new methods can be specified with `-m`: `hmm`, `hmm-germline`, and `hmm-tumor`.
- With `-m flasso`, force a breakpoint at centromeres, as was already done for the default 'cbs' method.
`reference`:
- The option `--antitargets` is no longer required to build a flat reference. Previously, building a flat reference for WGS or TAS required creating an empty file to use as antitargets alongside the target BED.
- Print a warning if the sample sex inferred from targets does not match that of antitargets. (281)
`scatter`:
- Removed the deprecated, invisible option `--background-marker`. (Use `--antitarget-marker` instead.)
- Trendlines should reflect small CNVs better, while preserving overall smoothing. The implementation now uses the Savitzky-Golay method instead of a Kaiser window, and the smoothing bandwidth is better-tuned. (This can also slightly improve outlier filtering in `segment`.)
`export seg`:
- Add option `--enumerate-chroms` to replace chromosome or contig names with sequential integers. Previously, this renumbering was always done, following some version of the SEG format. But since most tools don't require the contigs to be sequential integers, and this behavior causes trouble for users, it's now disabled by default. (282)
`gainloss`/`genemetrics`:
- Rename `gainloss` command to `genemetrics`. A shim is in place so `cnvkit.py gainloss` will continue to work. (278)
- Report segment- and bin-level weight and probes separately. (107, 278)
Bug fixes
---------
- autobin: Require -g/--access for WGS (289)
- batch: Use the "access" regions for the WGS workflow to choose bin size; these were previously being ignored, so bin sizes were too large, being based on the size of the whole genome, not just sequencing-accessible regions.
- call: Safely handle bins with zero weight when running `call --filter cn`. (chapmanb/bcbio-nextgen2112; thanks chapmanb)
- coverage, guess_baits.py: Handle input BED files containing >4 columns. (301)
- gainloss: Without `-s`, make 'depth' the weighted mean of bins, not just the first bin's value.
- segment: Ensure the .cns output file's columns are sorted properly (291)
- vcfio: Don't crash if a record has no ALT values (279)
- tabio:
- Recognize BED format with decimal in chromosome name (293)
- Improvements to GFF/GTF/GFF3 parsing. The new options are mostly accessible through the Python API and the script 'skg_convert.py'. (311)
- In 'read_auto' (and all CNVkit commands that take regions as input), determine the file format first by checking the file extension and verifying the format of the first(-ish) line. Only if that doesn't work, fallback to the original method of testing the first(-ish) line against a brittle series of regular expressions. (315)
Python API
----------
- cnvlib.write: Newly available at the top level to write tabular files (like .cnr and .cns), symmetric with 'cnvlib.read()'. The 'cnvlib.tabio' alias to 'skgenome.tabio' has been removed; to read and write formats other than TSV-with-header ('tab'), import and use 'skgenome.tabio' directly.
- CopyNumArray.squash_genes: remove deprecated keyword argument 'squash_background'. Use 'squash_antitarget' instead.
- segmetrics: Move the functions supporting this command from 'cnvlib.command' to a new module 'cnvlib.segmetrics'.