Semibin

Latest version: v2.2.0

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.6.0

* Provide pretrained models from soil, cat gut, human oral,
pig gut, mouse gut, built environment, wastewater and global (training
from all environments).
* Add `check_install` command and run `check_install` before easy* command
* The user can now specify a pre-computed contig annotation table in
mmseqs format
* Fix bug with non-standard characters in sample names (68)
* Better subcommand names (`generate_sequence_features_*` and
`generate_cannot_links`)

0.5.0

* Faster `SemiBin --version`
* Lower memory usage and faster speed for `bin` subcommand
* Reclustering is now the default (due to improved speed). Added
`--no-recluster` option to disable it
* Output of bedtools is now processed as a stream instead of using a
(potentially large) intermediate file
* Fix bug with --min-len. Previously, only contigs greater than the given
minimal length were used (instead of greater-equal to the minimal length).
* Fix bugs downloading GTDB
* Respect $XDG_CACHE_DIR if set
* Implement CACHEDIR.TAG protocol for the SemiBin cache directory

0.4.0

* Add support for .xz FASTA files as inputs
* Removed BioPython dependency
* Fixed bug in FASTA unzipping
* Fixed bug in multi-sample data splitting

0.3.0

* Support training from several samples
* Remove `output_bin_path` if `output_bin_path` exists
* Make several internal parameters configuable (1) minimum length of
contigs to bin (`--min-len` parameter); (2) minimum length of contigs to
break up in order to generate _must-link_ constraints (`--ml-threshold`
parameter); (3) the ratio of the number of base pairs of contigs between
1000-2500 bp smaller than this value, the minimal length will be set as
1000bp, otherwise 2500bp
* Add `-p` argument for `predict_taxonomy` mode
* Fix `np.concatenate` warning
* Remove redundant matrix when clustering
* Better pretrained models
* Faster calculating dapth using Numpy
* Use correct number of threads in `kneighbors_graph()`
* Respect number of threads (`-p` argument) when training (issue 34)

0.2.0

* Change name to `SemiBin`
* Add support for training with several samples
* Test with Python 3.9
* Download mmseqs database with `--remove-tmp-file 1`
* Better output names
* Fix bugs when paths have spaces
* Fix installation issues by listing all the dependencies
* Add `download_GTDB` command
* Add `--no-recluster` option
* Add `--environment` option
* Add `--mode` option
* All around more robust code by including more error checking & testing
* Better built-in models

0.1.1

* Fix bug with --minfasta-kbs

Page 4 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.