Cnvkit

Latest version: v0.9.11

Safety actively analyzes 681857 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

0.9.5

Minor bugfix and usability improvement.

`autobin`:
- Ensure targets are non-empty and match BAM chrom names (371)

`segment`:
- Suppress help text for deprecated --rlibpath (317)
- Fix help text display (380)

0.9.4

Performance improvements and bug fixes. Improved automated testing (254) and documentation (334).

Optimized performance of selecting genomic intervals, in particular speeding up `call`, `segment`, and `segmetrics` for whole genome and exome datasets. (340, 346)

Added script snpfilter.sh to help create T/N VCFs suitable for use with CNVkit. (364)


Commands
========

`batch`, `segment`:

- Add option `--rscript-path` to specify the preferred Rscript installation to use in case it is not in the default path. Deprecate the similar option `--rscriptpath`. (317, 321, 322; thanks MajoroMask and chapmanb)


`reference`:

- Only print the rejected targets if there are fewer than 500 of them; otherwise, just print the number that were rejected. (354)


`segment`:

- Tighten 'flasso' p-value threshold from .005 to .0001. The more lenient threshold had led to over-segmentation.


`segmetrics`:

- Optimize bootstrapping procedure for ~10x speedup and lower memory usage. (346)


`call`:

- Add option `--drop-low-coverage`, matching the other commands.


`import-rna`:

- Implement `-n/--normal` option. (362)
- Add `--max-log2` option, default +3.0.
- Add options `--no-gc`, `--no-txlen` to disable bias corrections.


`export bed`:

- Add option `--label-genes`. By default, the 4th column is filled with the sample ID, which is undesirable if only sample (.cns file) is being exported to BED. This option keeps the gene labels.


Python API
==========

- Changed default intersection mode from 'inner' to 'outer'. For the CNVkit command line operations this shouldn't have a visible effect.
- BED file parser handles (i.e. skips) initial "browser position" line.
- Add method `GenomicArray.iter_ranges_of()` to iterate over intervals retrieving values of a specified column, without copying chunks of the entire GenomicArray table.
- Add method `GenomicArray.intersection()` (340)
- tabio: Add 'vcf-simple' and 'vcf-sites' reader formats (WIP; 231)


Bug fixes
=========

- `scatter`: Avoid an error in smoothing (369; thanks mpschr)
- `sex`: Don't crash if chrX or chrY is missing; just print "NA"
- `import-rna`: Avoid a crash if -n is not used.
- Script `cnv_expression_correlate.py`: Avoid a crash on Py3
- Script `cnv_annotate.py`: Fix command-line option parsing (367)

0.9.3

A quick bugfix release to fix a potential crash in the `segmetrics` command (325).

0.9.2

This release contains a new command `import-rna` to infer coarse-grained copy number from RNA expression data. (151)

Three new HMM-based segmentation methods are offered: 'hmm', 'hmm-germline', and 'hmm-tumor'. These should be considered experimental and used with caution; the implementations are likely change in the next release.

The option `--male-reference` in the commands `batch`, `reference`, `fix`, `call`, and `export` (at least) has been renamed to `--haploid-x-reference` everywhere to reduce user confusion. A shim is in place so `--male-reference` will continue to work.

Documentation, logging, and some error messages are improved.

Thanks to chapmanb, MajoroMask, and others for contributing to this release.


Dependencies
------------

- 'pandas' version 0.22 is supported.
- 'pysam' version 0.13.0 is supported.
- 'hmmlearn' version 0.2 is a run-time requirement to use the new HMM-based segmentation methods. The rest of CNVkit can be run without it. To ensure the right version is installed, install CNVkit with conda as usual, then install hmmlearn with pip within the CNVkit conda environment.
- Assume and require pip/setuptools for installation. (This is included with stock Python 2.7 and later.)


Scripts
-------

- New script "skg_convert.py" to convert between BED, GATK interval list, GFF, VCF, and tabular formats using the 'skgenome.tabio' sub-package, with options for simple post-processing.
- Removed the deprecated script refFlat2bed.py. (Use skg_convert.py instead.)


Commands
--------

`access`:

- Drop noncanonical, untargeted contigs/chromsomes by default. This affects analyses run from scratch with `batch`, too. (169, 299)


`segment`:

- Three new methods can be specified with `-m`: `hmm`, `hmm-germline`, and `hmm-tumor`.
- With `-m flasso`, force a breakpoint at centromeres, as was already done for the default 'cbs' method.


`reference`:

- The option `--antitargets` is no longer required to build a flat reference. Previously, building a flat reference for WGS or TAS required creating an empty file to use as antitargets alongside the target BED.
- Print a warning if the sample sex inferred from targets does not match that of antitargets. (281)


`scatter`:

- Removed the deprecated, invisible option `--background-marker`. (Use `--antitarget-marker` instead.)
- Trendlines should reflect small CNVs better, while preserving overall smoothing. The implementation now uses the Savitzky-Golay method instead of a Kaiser window, and the smoothing bandwidth is better-tuned. (This can also slightly improve outlier filtering in `segment`.)


`export seg`:

- Add option `--enumerate-chroms` to replace chromosome or contig names with sequential integers. Previously, this renumbering was always done, following some version of the SEG format. But since most tools don't require the contigs to be sequential integers, and this behavior causes trouble for users, it's now disabled by default. (282)


`gainloss`/`genemetrics`:

- Rename `gainloss` command to `genemetrics`. A shim is in place so `cnvkit.py gainloss` will continue to work. (278)
- Report segment- and bin-level weight and probes separately. (107, 278)


Bug fixes
---------

- autobin: Require -g/--access for WGS (289)
- batch: Use the "access" regions for the WGS workflow to choose bin size; these were previously being ignored, so bin sizes were too large, being based on the size of the whole genome, not just sequencing-accessible regions.
- call: Safely handle bins with zero weight when running `call --filter cn`. (chapmanb/bcbio-nextgen2112; thanks chapmanb)
- coverage, guess_baits.py: Handle input BED files containing >4 columns. (301)
- gainloss: Without `-s`, make 'depth' the weighted mean of bins, not just the first bin's value.
- segment: Ensure the .cns output file's columns are sorted properly (291)
- vcfio: Don't crash if a record has no ALT values (279)
- tabio:

- Recognize BED format with decimal in chromosome name (293)
- Improvements to GFF/GTF/GFF3 parsing. The new options are mostly accessible through the Python API and the script 'skg_convert.py'. (311)
- In 'read_auto' (and all CNVkit commands that take regions as input), determine the file format first by checking the file extension and verifying the format of the first(-ish) line. Only if that doesn't work, fallback to the original method of testing the first(-ish) line against a brittle series of regular expressions. (315)


Python API
----------

- cnvlib.write: Newly available at the top level to write tabular files (like .cnr and .cns), symmetric with 'cnvlib.read()'. The 'cnvlib.tabio' alias to 'skgenome.tabio' has been removed; to read and write formats other than TSV-with-header ('tab'), import and use 'skgenome.tabio' directly.
- CopyNumArray.squash_genes: remove deprecated keyword argument 'squash_background'. Use 'squash_antitarget' instead.
- segmetrics: Move the functions supporting this command from 'cnvlib.command' to a new module 'cnvlib.segmetrics'.

0.9.1

Highlights: Useful enhancements and changes to plotting and segmentation, and a new script for single-exon CNV testing.

Plus, bug fixes and usability improvements to avoid unexpected errors. (250, 255, 262, etc.)


Dependencies
------------

- Compatible with the most recent pandas version 0.21.0 (273, 274; thanks chapmanb)
- R dependencies were reduced to simplify installation.

Scripts
-------

- Renamed "cnn_\*.py" to "cnv_\*.py".
- New script "cnv_ztest.py" to detect single-bin (e.g. single exon) deep deletions and high-level amplifications.
- In "cnv_updater.py", rename "Background" (i.e. off-target) bins to "Antitarget", addition to adding a "depth" column if it's missing.

Commands
--------

`autobin`:

- Raise the maximum target/antitarget bin sizes to 50kb/1Mb.

`fix`:

- Allow specifying sample_id via ``--sample-id``/``-id``, in case the input coverage filenames do not have the expected form "sample_id.targetcoverage.cnn" and "sample_id.antitargetcoverage.cnn". (269; thanks chapmanb)

`segment`:

- Process each chromosome arm separately (with 'cbs' and 'haar', but not 'flasso'). Centromere locations are guessed from the largest gap between sequencing-accessible regions, and are not necessarily the true locations, although they do match fairly well on the human genome.
- Logging of dropped bins is streamlined somewhat.
- New method `-m none` to only calculate arm-level segment means (for testing and experimentation).

`scatter`:

- Highlight non-neutral segments from .call.cns. If segments have the columns 'cn' and potentially also 'cn1' and 'cn2' (as added by the `call` command), use those fields to display copy number alterations, LOH and allelic imbalance with colorized segments (orange by default), and use gray for neutral segments. If a VCF is also given, the same is done for SNVs in the lower panel. Otherwise, all segments are colorized as before. (18, 157)
- New option `--by-bins` to display x-axis positions by sequential bin number on each chromosome, rather than genomic coordinates. This makes the plots much more useful with targeted amplicon sequencing data, or very small gene panels. (63)
- Trend line (`--trend`) now accounts for bin weights, which generally results in a better fit.
- Improved interaction of -c and -g options:

- Only apply the window margin (-w) if -g is used alone, or -c specifies a small chromosomal region with no genes.
- Allow an empty gene list (-g '' or -g ',') to prevent highlighting and labeling of any genes / small non-genic "Selection" in the -c region.
- If any gene in -g is not fully within the region specified by -c, name that gene and its coordinates in the error message.
- If the -c region has size <=0, show a specific error message.
- Handle NaN log2 values when calculating y-axis limits.

`heatmap`:

- Incorporate the `--by-bins` argument to match `scatter`. (63)
- Warn if selected region contains no data for a sample. This helps troubleshoot if a chromosome name was mis-specified on the command line. (268)

`export seg`:

- Change column headers to match DNAcopy output. The column headers generally don't matter in the SEG format, but the DNAcopy dataframe is considered the canonical form.


Python API
----------

- cnvlib.do_segment -- new keyword argument min_weight to drop bins with 'weight' below the specified value. If not used, then only bins with weight 0 will be dropped. This feature is not recommended for normal usage and is not available on the command line.
- cnvlib.do_scatter -- Remove deprecated keyword argument 'background_marker' in favor of 'antitarget_marker', corresponding to `scatter` options deprecated in v0.9.0.
- cnvlib.cnary.CopyNumArray: Add method 'smoothed', which calculates the trendline displayed by the `scatter` command.
- skgenome.tabio: Add read support for samtools 'dict' format, which resembles the plain-text SAM header and can contain chromosome names and sizes.
- skgenome.gary.GenomicArray: Add magic methods `__bool__` (Py3) and `__nonzero__` (Py2) to ensure an empty GenomicArray, i.e. 0 rows, is treated as false-ish on both Python 2.7 and 3.x.

0.9.0

In addition to bug fixes, documentation updates, and usability improvements, this release includes some larger changes:

- The off-target bins in .cnn and .cnr files are now assigned the label "Antitarget" instead of "Background" in the "gene" column.

The label "Background" in existing files will still be handled the same way, but new output files generated with CNVkit 0.9.0 and later will use the "Antitarget" label -- so, earlier versions of CNVkit may have problems with files produced by CNVkit 0.9.0. Some command line options and API keyword arguments similarly replace "background" with "antitarget", with shims in place for compatibility with existing scripts. (171)

- The sub-packages 'genome' and 'tabio' are now in a separate top-level package 'skgenome', still included in the CNVkit distribution. (See "Python API" below.)

This does not affect the command-line usage of CNVkit, but clears the way to extract a scikit-genome package that can be installed and used separately from CNVkit for computing with genomic intervals.


Documentation
-------------

- Link to an example VCF file that contains matched tumor and normal samples and will work nicely with CNVkit.
- Describe the `breaks` command's output columns. (220)
- Show a Python code example customizing a plot with matplotlib.pyplot. (196)


Dependencies
------------

- pysam: Raise minimum to 0.10; support new version 0.11.2.1 (218; thanks chapmanb)
- pandas: Support new version 0.20.1 (215)
- numpy: Support new version 0.13 (235, 238)


Commands
--------

`batch`:

- Log the CNVkit version number at the start of the run.
- Print a message at the end if no tumor/test samples were specified. (214)
- Clarify error messages for bad option combinations. (216)
- Removed the deprecated, suppressed/invisible option `--split`. It was a shim in the 0.8 series to support old scripts.

`reference`:

- Ensure the inferred chromosomal sex matches between the targets and antitargets for the same sample. If the inferences do not match, prefer antitargets. (234, 237)

`fix`:

- Warn & don't reweight bins if most antitargets have no/low coverage. This avoids a variety of surprising downstream problems when the input was specified as hybrid capture (the default), but is actually from targeted amplicon sequencing, or otherwise has no reads mapped to most off-target bins.

`segment`:

- Log the segmentation method and p-value/q-value threshold.

`call`:

- Add option `--center-at`, for re-centering log2 values at a user-specified neutral value.
- The option `--center` can be used without an argument, in which case it uses the default centering method 'median'.

`diagram`:

- New option `--title` to add a custom title to the top of the generated figure. (239; thanks micknudsen)


`export vcf`:

- When given a .cnr file corresponding to the usual segmented input file (.cns), emit the CIPOS and CIEND tags in the generated VCF. These indicate the "fuzzy" coordinates of segment breakpoints. Here, the ranges are simply the widths of the underlying bins adjacent to each segment breakpoint. These tags can help meta-methods aggregate/harmonize CNVkit's calls with those of other structural variant callers. (72)

`import-picard`:

- Don't accept directory as an argument (was deprecated).
- Be a little more flexible in filenames accepted: instead of requiring input files to be named `*.targetcoverage.???` or `*.antitargetcoverage.???`, strip the full suffix and default to 'targetcoverage.cnn' output suffix, or 'antitargetcoverage.cnn' if input filename contains 'antitarget'. Works the same for filenames following the earlier convention, but now is pretty safe for amplicon targets with arbitrary filenames, and behavior is generally less surprising.


Bug fixes
---------

- `antitarget`: Don't crash if `-g`/`--access` is not given (207)
- `batch`: Don't crash in 'wgs' mode when given just targets (`-t`) without a FASTA reference genome sequence (`-f`)
-`call --filter ampdel`: Drop segments with copy number (`cn` field) between 0 and 5, exclusive, as the documentation indicates. Previously, it was just merging adjacent segments with copy number 1--4, but not dropping them. (222)
- `export cdt`: Match the CDT spec. Fix a regression in which columns could be swapped/misaligned versus the header. Add a dummy "EWEIGHT" row to ensure Java TreeView starts reading data from the correct line in the file.
- `export theta`: Don't crash on bins where reference is NaN. (168)
- `metrics`, `descriptives`: Handle degenerate/trivial cases consistently. (202)
- `segment`: Handle sample names that are integers with leading zeros. (213)
- `sex`: Don't crash if chromosomes X and Y are both missing. (236)
- VCF parsing (`call`, `scatter`, `segment`):
- Safely handle small or empty VCF files that previously could trigger a crash during BAF calculation. Now, with an empty VCF an all-blank "baf" will be emitted. (218, 224; thanks chapmanb)
- Improve handling of Mutect2 VCF files, somewhat. Mutect2 VCFs are still not recommended as input to CNVkit; try FreeBayes or GATK HaplotypeCaller instead. (195)


Python API
----------

Moved sub-packages 'genome' and 'tabio' to separate top-level package 'skgenome'
(201). The top-level `cnvlib` API is mostly the same otherwise, but supporting
modules were refactored to decouple `skgenome` from `cnvlib` and remove
redundancies. In particular:

- Split module `cnvlib.core` split into `skgenome.tabio` and `cnvlib.cmdutil`
- Remove GenomicArray static method `row2label` in favor of functions `to_label` and `from_label` in new module `skgenome.rangelabel`.
- The SEG writer in 'tabio' now replaces chromosome names with 1-based integer indices, per SEG spec/convention. The `export seg` command now uses this writer directly.


Scripts
-------

- Remove the script `coverage_bin_size.py`, previously deprecated in favor of the `autobin` command.
- Add `skg_convert.py` to convert between tabular formats (including BED and UCSC RefFlat).
- Deprecate `refFlat2bed.py` in favor of `skg_convert.py`.
- Add `cnn_annotate.py` to replace the "gene" field for each bin in a .cnn or .cnr file, given a gene annotation database like refFlat.txt. The need for this comes up occasionally when users notice at the end of an analysis that vendor-annotated targets are not the desired gene names.

Page 2 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.