Kmerdb

Latest version: v0.8.10

Safety actively analyzes 687881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.2

Uh, a few basics. Starting type inference. Changing to uint32 all around. Uint32, Uint64, float32, float64. Humor?

Thanks to all. Happy New Year.

Goodies...

Most excellent.

0.1.2

Improvements to memory usage, parallelization, and runtime.

Adds a `-d|--dtype` command-line parameter.

0.1.1

This bugfixes the `.fastq` parsing capabilities to handle block sizes less than or equal to the block size parameter `b`. It has been acceptance tested.

0.1.0

This is the first release of the backwards incompatible change to the PostgreSQL database (technically 0.0.11 or 0.0.12 were the first releases that contained the changes). Now the program is streamlined to use in-memory k-mer counting and aggregation.

0.0.10

The true alpha release.

Bugfixes 46, makes everything fully pipeable by standardizing on `nargs="*"` instead of `nargs="+"` to permit implicit STDIN reading:

bash
Generate count matrix | Normalize NB counts | nxn distance matrix | k-means clustering
kmerdb matrix Unnormalized *.kdb | kmerdb matrix Normalized | kmerdb distance correlation | kmerdb kmeans -k 10 sklearn



Again, the 4 main steps in this data processing workflow
1. Profile generation
bash
kmerdb profile genome.fa genome1.kdb
kmerdb profile input1.fa input2.fa compound1.kdb

2. Data matrix (X) generation
bash
4 options for matrix generation/processing:
Normalization vs PCA vs tSNE
kmerdb matrix Unnormalized *.kdb > X.tsv
kmerdb matrix Normalized *.kdb Uses DESeq2 normalization suitable for NB distributed count matrices via rpy2
kmerdb matrix PCA -n 20 *.kdb
kmerdb matrix tSNE -n 20 *.kdb

3. Distance matrix generation
- 22 distance metrics available
bash
kmerdb distance spearman X.tsv
kmerdb matrix Normalized *.kdb | kmerdb distance euclidean

4. Hierarchical/kmeans clustering
bash
kmerdb kmeans -k 10 -i input.tsv sklearn Biopython k-means implementation also available
kmerdb hierarchical -i input.tsv -m ward

0.0.7

This has been used in a test sweep over 30 genomes, and has been deemed to be suitable for academic use.
The new release features IUPAC support, a full graph database (in bgzf format only) with the `--all-metadata` flag on the `kmerdb profile` command.

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.