Uh, a few basics. Starting type inference. Changing to uint32 all around. Uint32, Uint64, float32, float64. Humor?
Thanks to all. Happy New Year.
Goodies...
Most excellent.
0.1.2
Improvements to memory usage, parallelization, and runtime.
Adds a `-d|--dtype` command-line parameter.
0.1.1
This bugfixes the `.fastq` parsing capabilities to handle block sizes less than or equal to the block size parameter `b`. It has been acceptance tested.
0.1.0
This is the first release of the backwards incompatible change to the PostgreSQL database (technically 0.0.11 or 0.0.12 were the first releases that contained the changes). Now the program is streamlined to use in-memory k-mer counting and aggregation.
0.0.10
The true alpha release.
Bugfixes 46, makes everything fully pipeable by standardizing on `nargs="*"` instead of `nargs="+"` to permit implicit STDIN reading:
Again, the 4 main steps in this data processing workflow 1. Profile generation bash kmerdb profile genome.fa genome1.kdb kmerdb profile input1.fa input2.fa compound1.kdb
2. Data matrix (X) generation bash 4 options for matrix generation/processing: Normalization vs PCA vs tSNE kmerdb matrix Unnormalized *.kdb > X.tsv kmerdb matrix Normalized *.kdb Uses DESeq2 normalization suitable for NB distributed count matrices via rpy2 kmerdb matrix PCA -n 20 *.kdb kmerdb matrix tSNE -n 20 *.kdb
This has been used in a test sweep over 30 genomes, and has been deemed to be suitable for academic use. The new release features IUPAC support, a full graph database (in bgzf format only) with the `--all-metadata` flag on the `kmerdb profile` command.