Omamer

Latest version: v2.0.3

Safety actively analyzes 622420 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

2.0.3

2.0.2

- changed method for hiding taxa in build process. Now takes a file containing taxa to hide on separate lines.
- checks and improved feedback for root taxon and requested taxa to hide.
- root taxon set by default to the root level in speciestree.nwk (previously hard-coded to default to LUCA)

2.0.1

What's Changed
- remove dependency for filehash library
- return better error message if build dependencies are not met, but trying to building an omamer database
- minor fixes
- Bump actions/checkout from 3 to 4 by dependabot in https://github.com/DessimozLab/omamer/pull/24


**Full Changelog**: https://github.com/DessimozLab/omamer/compare/v2.0.0...v2.0.1

2.0.0

- Major update of database format and search code to improve overall memory useage. Most standard runs with LUCA-level database will run on a machine with 16GB RAM.
- Update to the scoring algorithm for root-level HOG / family assignments, to allow for significance testing. This estimates a binomial distribution for each family, so that we can compute thee probability of matching at least as many _k_-mers as we have observed by chance, for each family that has a match to a given query.
- UX improvements - more feedback during interactive search runs, whilst maintaining small log files.

---
Brief overview of major changes to OMAmer

The OMAmer placement algorithm consists of two steps: placing a query sequence into a protein family (root level HOG in OMA), before placing it into a sub-family. The original OMAmer publication focused on providing better and faster subfamily-level assignments than methods based on closest-sequence. Recently, the group has developed [OMArk](https://github.com/DessimozLab/OMark), a software package for proteome (protein-coding gene repertoire) quality assessment. The original OMAmer method was developed using a smaller taxonomic range than required for OMArk, which meant that the largest gene families were much smaller and less diverse in _k_-mer content. The largest HOG in OMA (November 2022 release) contains over 101,000 proteins and represents 53.9% of the _k_-mer index, based on the 6-mers that OMAmer uses by default. This means that a random protein sequence is very likely to be associated with this HOG.

In order to allow for this, we developed a scoring mechanism based on the binomial distribution. For each family, we estimated the probability of a random _k_-mer matching. We can then compute the $\textrm{Binomial}(N_{\textrm{query}}, P_{\textrm{family}})$ distribution for each family with matches (with probability $P_{\textrm{family}}$), with the number of draws ($N_{\textrm{query}}$) being the number of _k_-mers in the query sequence. Computing the complementary CDF (survival function), we can compute the probability of matching at least as many _k_-mers matches as we have observed by chance, for each family that has a match. Note: the results of this test are computed in negative-log units (natural log) for accuracy.

This is then used to filter the list of families which have an overlap with the query sequence (argument “``--family-alpha``”, default $10^{-6}$), to give us a list of candidate families. Candidates are then ordered by a normalised _k_-mer count, similar to the original algorithm, with any ties broken based on the proportion of the query sequence covered by matching _k_-mers, and then by the p-value computed above. By default, only the top family is taken forward. Sub-family placement is as in the original manuscript. Further software optimisation was performed, but did not affect the underlying method. As an example, it is now possible to run using the LUCA database in under 12GB of memory, whereas before this was using in excess of 40GB.

0.2.6

What's Changed
* support for numpy>1.23

**Full Changelog**: https://github.com/DessimozLab/omamer/compare/v0.2.5...v0.2.6

0.2.5

small patch release that fixes an issue with the previous version when building a new database from scratch.

**Full Changelog**: https://github.com/DessimozLab/omamer/compare/v0.2.4...v0.2.5

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.