Kmerdb

Latest version: v0.8.15

Safety actively analyzes 726363 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 5

0.7.8

kmerdb graph` introduced, producing a new file form `.kdbg`, an edge list. New metadata schema for new format as well. `kmerdb view` and `kmerdb header` are compatible with new format.

The goal is to create an weighted graph. Support for assembly and graph visualizations in the future.

After 0.7.6 the `.kdb` spec will be *loosely* deprecated. While the .kdb format may remain unchanged (don't know yet), the goal is to produce an adjacency list structure from only the k-mer counts and the 'neighbor' k-mer ids. After the format revision (mostly to the `--all-metadata` option), a new command `kmerdb graph` will be applied to generate a on-disk representation of an adjacency list.

* What does this mean?

At this point, the new feature is in the planning stage, and it is not known if backwards compatibility (< 0.7.7) will be supported. One goal is to create an adjacency list structure on disk from the `--all-metadata` augmented `.kdb` format. It is not clear yet if cycles will be permitted in the graph structure, or if a distinct "offset" flag will be used. An example follows.

- 0.7.6 `.kdb` format
col1 is row number, col2 is sort order, col3 is k-mer id, col4 is k-mer count, col5 (`--all-metadata`) featured a loosely specified 'neighbor' JSON field, consisting of a dictionary with "A", "C", "T" "G" etc. keys and it was poorly implemented. Basically, the neighboring (left side and right side) k-mer ids were provided.

1 1 1 123

- 0.7.7+ `.kdbg`
col1 is unique row number, col2 is k-mer id (may be repeated), col3 is a `.csv` field of possible adjacent *row-ids*, corresponding to the k-mer id's (col2) neighbors in kmer-space. col4 represents a possible solution for the graph traversal that produces a Hamiltonian (whatever) walk through the graph recapitulating either the exact (`.fasta`) assembly solution *OR* a potential solution to the assembly from available data and a feasible solution either using `networkx` or somehow a custom graph traversal algorithm that minimized the penalty of omitting rows/k-mers based on the suggestion of the shortest path to visit each k-mer once but that also? maximizes the number of rows visited? I'm not sure yet how this will be specifically implemented, as the `.kdbg` format is the first step.

1 1234 2345,3456,... 3

0.7.4

Build system migration completed. Closes 109. Completes triage of the hotfix rebase. Old master abbrogated.

0.7.1alpha

Migrating to a new wheel generation process using `python -m build`. `pyproject.toml` Generates a working installed module `kmerdb`.

This can be confirmed by running
python
$ python
>>>import kmerdb

Similarly, direct module invocation works fine, this minor release is bugged because of the `pyproject.toml`.
bash
python -m kmerdb -h

0.7.0

Adds parallelization support to the Cythonized distance function 'pearson'

`kmerdb distance -p 10 pearson inputs.tsv`

0.6.61

Minor documentation changes, some refactoring/renaming.

0.6.9

Minor bug fixes, patch to the Cython extension to use an 80-bit floating point accumulator.

Page 2 of 5

Releases

Has known vulnerabilities

Previous Next

Kmerdb

Page 2 of 5

0.7.8

0.7.4

0.7.1alpha

0.7.0

0.6.61

0.6.9

Page 2 of 5

Links

Releases