Metasbt

Latest version: v0.1.3

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.1.3

New features

- New option `--uniform-strand` available with the `index` and `update` modules for processing the input sequences all on the same strand. Mainly used for viral sequences;
- New option `--use-representatives` available with the `index` module to use only three representative genomes at the species level;
- New option `--resume` available with the `index` and `update` modules able to resume the index and update processes in case of unexpected errors;
- New `expand_fasta.py` utility in `scripts` to expand input fasta files into multiple file. One fasta file for each read. Mainly used for viral sequences;
- New `fastcluster.py` utility in `script` to compute a average-linkage hierarchical clustering of a set of genomes based on their Mash distances;
- Both the `index` and `update` modules now display a worning message in case the configuration file under `--resume` has been previously generated with a different version of MetaSBT;
- Both the `index` and `update` modules now integrate `CheckV` and `EukCC` for assessing the quality of viruses and eukaryotes;
- `CheckM` has been upgraded to `CheckM2`;
- The `cluster()` function in `utils` is now running in parallel;
- The `howdesbt bfdistance` command for computing the distances between bloom filters is now running in parallel.

Fixes

- It correctly checks now for new framework versions when starting a new `metasbt` instance;
- Fixed genome quality filtering on completeness and contamination during the `update`;
- Improving docstring adopting the [numpydoc](https://numpydoc.readthedocs.io/en/latest/) documentation format.

0.1.2

First public stable release of MetaSBT.

It is composed of the following modules:

- `index`: build a MetaSBT database by building a series of Sequence Bloom Trees at different taxonomic levels;
- `boundaries`: define taxonomy-specific boundaries as the minimum and maximum number of kmers in common between all the genomes under a specific cluster;
- `profile`: taxonomically profile a genome by querying a MetaSBT database at different taxonomic levels;
- `report`: build a report table describing the content of a MetaSBT database;
- `update`: update a MetaSBT database with new genomes;
- `tar`: pack a MetaSBT database into a ready-to-be-distributed tarball;
- `install`: install a MetaSBT database tarball locally under a specific location of the file system.

The framework also comes with a set of utilities:

- `bf_sketch.py`: build minimal bloom filter sketches with cluster-specific marker kmers;
- `esearch_txid.sh`: retrieve GCAs from NCBI GenBank given a specific taxonomic ID;
- `get_ncbi_genomes.py`: retrieve reference genomes and metagenome-assembled genomes under a specific superkingdom and kingdom from NCBI GenBank;
- `howdesbt_index.sh`: index genomes with HowDeSBT;
- `uniform_inputs.sh`: uniform input genome files extension.

Links

Releases

Has known vulnerabilities

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.