Cnvkit

Latest version: v0.9.11

Safety actively analyzes 681857 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

0.6.1

Small fixes in segmentation, affecting the output of `segment` and preventing crashes in `segmetrics`:
- Exclude fewer low-coverage bins from segmentation (using a lower minimum coverage threshold).
- In case the first or last bins on a chromosome were excluded from segmentation, adjust the first and last segments on each chromosome so that their endpoints match the first and last bins.
- If no bins on a chromosome passed the coverage filter, instead of omitting the chromosome from segmentation output, generate a single segment covering the full chromosome, with segment log2 ratio 0.0. (So, all chromosomes in the .cnr file will be present in the .cns file, too.)

0.6.0

Added two new commands, `call` and `segmetrics`, and a new `export` format, BED.

`segmetrics`:
- Calculates summary statistics of the residual bin-level log2 ratio estimates from the segment means, similar to the existing `metrics` command, but for each segment individually. Results are output in the same format as the CNVkit segmentation file (.cns), with the stat names and calculated values printed in the "gene" column.
- Supported stats:
- standard deviation, median absolute deviation, inter-quartile range, Tukey's biweight midvariance (as in `metrics`);
- confidence interval, estimated by bootstrap;
- prediction interval, estimated by the range between the 2.5-97.5 percentiles of bin-level log2 ratio values within the segment.
- Thanks to mjafin for suggesting this feature (28).

`call`:
- Given segmented log2 ratio estimates (.cns file), round the copy ratio estimates to integer values using either:
- A list of threshold log2 values for each copy number state, or
- Some algebra, given known tumor cell fraction and normal ploidy. (This was previously available through the `export freebayes` command, see below.)
- The output is another .cns file, where the values in the `log2` column are still log2-transformed, but represent integers in log2 scale -- e.g. a neutral diploid state is represented as "0.0", not the integer 2. These output files are still compatible with the other CNVkit commands that accept .cns files, and can be plotted the same way.

`export bed`:
- New `bed` format supporting the same features as `export freebayes` that were not moved into the `call` command (see above). The output BED file is still compatible with the FreeBayes `--cnv-map` option. In addition, `export bed` has the new option `--show-neutral` to also output neutral-CN segments/regions, in addition to the CNV regions output by default.
- The `export freebayes` sub-command is deprecated but still available in this release; it will be removed in the next release. This command supported the tumor-purity adjustment now implemented in the `call` command. The recommended approach is to instead run `call` first on each .cns file, and then `export bed` on all the adjusted .cns files to get an equivalent BED file compatible with FreeBayes `--cnv-map` option.

Smaller changes:
- `gainloss`: Reduced the default log2 ratio threshold from .5 to .2
- `import-picard`: Use the un-normalized mean coverage instead of the normalized coverage of each target as the log2 coverage values in the output .cnn file. This matches the output of the `coverage` command; CNVkit normalizes coverages later in the pipeline.
- Some internal refactoring. Please report any bugs, real or perceived, on our GitHub issue tracker.

0.5.1

Bug fixes for two edge cases in whole genome analyses (thanks chapmanb):
- reference: Merging target and antitarget .cnn files where antitargets are empty
- diagram: Avoid trying to plot segements over the start or end of chromosomes

0.5.0

This release includes a variety of improvements to CNVkit's calling accuracy and robustness. All CNVkit files built with previous versions will continue to work with this version, but for best results, I recommend rebuilding your reference.cnn file(s) from the targetcoverage.cnn and antitargetcoverage.cnn files.

`coverage`:
- Output target/antitarget coverage (.cnn) files are no longer median-centered. Read depths in each bin are still log2-scaled, but the observed read depth can now be easily recovered from .cnn files.

`reference`, `fix`:
- Include a "flat pseudocount" in addition to the given normals, making paired tumor-normal calling much more robust and accurate.
- Perform bias corrections on the input normal samples before calculating the average and spread of log2 values.

`fix`:
- Do bias corrections before subtracting the reference, instead of after, because the reference already includes bias corrections now.
- In addition to weighting bins by spread (which can only be observed with a pooled reference), also weight by bin size and deviation of reference log2 values in each bin from the global median. So, useful bin weights are now derived from "flat" and single-normal-sample references, too.

`segment`:
- Recalculate CBS segment means using bin weights (in the R library this simply the mean, arguably a bug).
- Set CBS segment start/end positions to match the underlying bin start/end positions.
- Improved centromere detection -- only exclude one "large gap", if any, from each chromosome.
- Tuned CBS calling parameters to improve accuracy (see benchmarks in the repo etal/cnvkit-examples).

`diagram`:
- Label genes using the same criteria as the `gainloss` command: if segments are given, use the segment value at each gene, otherwise calculate the weighted average of bin-level log2 values within each gene.
- New option `-m`/`--min-probes` to match `gainloss`.
- Guess gender from chrX more reliably, so that the same gender is called from the bin-level (.cnr) and segmented (.cns) values given.

`scatter`, `loh`:
- When plotting allele frequencies from a VCF, if segments are given (.cns), also apply those segments to allele frequencies to show LOH regions that match CNVs.
- Skip somatic variants identified in a VCF, and try to retain only germline variants, when plotting LOH. (This is not very well standardized across callers, so please watch for bad behavior from callers other than FreeBayes and MuTect, and let me know about it!)
- `scatter` only: Added options `--y-min`, `--y-max` to set y-axis limits on the plot.
- Removed the deprecated `-r` option. Use `-c` instead.

The long-deprecated `cbs` command has been removed. Use `segment` instead.

Bugs in parsing and writing empty and 1-line VCF, BED and CNVkit files, and other VCF quirks, have now been fixed (Thanks chapmanb!)

0.4.1

New features:
- `scatter` command:
Option -c can now take coordinate ranges like -r, so -r is deprecated and will be removed in the next release.
- `genome2access.py` script:
New -x option to exclude additional regions. Added a new file "data/access-5k-mappable.hg19.bed" which used this option to exclude the Encode "Duke" and "Dac" low-mappability regions.

Also:
- Improved the help/usage messages for several commands. Added a "version" command that prints the current CNVkit version. (Thanks HenrikBengtsson)
- Tuned CBS calling parameters to improve segmentation accuracy according to some benchmarks.
- Sped up a few slow functions identified by profiling. In particular, `metrics` is much faster now.
- Fixed bugs/incompatibilities in plotting commands and cleaned up the source code (Thanks chapmanb and roryk)

CNVkit can now be obtained and run as a Docker container:
https://registry.hub.docker.com/u/etal/cnvkit/

0.4.0

New features:
- Plotting ( `scatter` and `loh` commands):
- Support VCFs from more callers, including MuTect, VarScan and FreeBayes. Support multi-sample VCFs; the sample in the VCF can be selected by name with the `-i` option, and will also be shown as the plot title. Thanks to Brad Chapman (chapmanb) for this contribution. (11)
- Enable highlighting of selected regions other than genes using the `-r` and `-w` options. The plot title (sample ID) can also be specified with `-i`/`--sample-id`. Thanks to Brad Chapman (chapmanb) for this contribution. (9)
- New `-l`/`--range-list` option to plot a BED file of regions, each in its own plot, and combine the generated plots into a single multi-page PDF file. Thanks to Rory Kirchner (roryk) for this contribution. (21)
- FreeBayes export format can now handle multiple samples (.cns files).

Changes:
- Renamed `--male-normal` option to `--male-reference` (but kept `-y` alias) in all commands that had it.
- `export` options: Specify sample name with `-i`/`--sample-id` option instead of `-n`.
- `scatter` plotting command: added `--min-variant-depth` option to match `loh`. (10)
- The `loh` plot command does not attempt significance testing anymore; we're working on a better solution. (10, 18)

Bug fixes:
- Handle empty BED/region/interval_list files, so that an empty "antitarget" file can be used when analyzing WGS or targeted amplicon capture datasets. (19)
- Ignore "." labels for genes, the same way we already ignore "-" labels, for better interoperability with BEDtools. Thanks to Brad Chapman (chapmanb) for this contribution. (12)
- Accept "sample.bai" as index for "sample.bam". (8)
- SEG import: The option `--from-log10` now works to convert log10 ratio values to log2 scale.

Documentation has also improved substantially, including the installation instructions. The built-in help text for each command now shows default values for each option, where applicable.

Page 6 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.