Scikit-bio

Latest version: v0.6.2

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.6.2

Features

* Added support for Microsoft Windows operating system. ([2071](https://github.com/scikit-bio/scikit-bio/pull/2071), [#2068](https://github.com/scikit-bio/scikit-bio/pull/2068),
[2067](https://github.com/scikit-bio/scikit-bio/pull/2067), [#2061](https://github.com/scikit-bio/scikit-bio/pull/2061), [#2046](https://github.com/scikit-bio/scikit-bio/pull/2046),
[2040](https://github.com/scikit-bio/scikit-bio/pull/2040), [#2036](https://github.com/scikit-bio/scikit-bio/pull/2036), [#2034](https://github.com/scikit-bio/scikit-bio/pull/2034),
[2032](https://github.com/scikit-bio/scikit-bio/pull/2032), [#2005](https://github.com/scikit-bio/scikit-bio/pull/2005))
* Added alpha diversity metrics: Hill number (`hill`), Renyi entropy (`renyi`) and Tsallis entropy (`tsallis`) ([2074](https://github.com/scikit-bio/scikit-bio/pull/2074)).
* Added `rename` method for `OrdinationResults` and `DissimilarityMatrix` classes ([2027](https://github.com/scikit-bio/scikit-bio/pull/2027), [#2085](https://github.com/scikit-bio/scikit-bio/pull/2085)).
* Added `nni` function for phylogenetic tree rearrangement using nearest neighbor interchange (NNI) ([2050](https://github.com/scikit-bio/scikit-bio/pull/2050)).
* Added method `TreeNode.unrooted_move`, which resembles `TreeNode.unrooted_copy` but rearranges the tree in place, thus avoid making copies of the nodes ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.root_by_outgroup`, which reroots a tree according to a given outgroup ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.unroot`, which converts a rooted tree into unrooted by trifucating its root ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.insert`, which inserts a node into the branch connecting self and its parent ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).

Performance enhancements

* The time and memory efficiency of `TreeNode` has been significantly improved by making its caching mechanism lazy ([2082](https://github.com/scikit-bio/scikit-bio/pull/2082)).
* `Treenode.copy` and `TreeNode.unrooted_copy` can now perform shallow copy of a tree in addition to deep copy.
* `TreeNode.unrooted_copy` can now copy all attributes of the nodes, in addition to name and length ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Paremter `above` was added to `TreeNode.root_at`, such that the user can root the tree within the branch connecting the given node and its parent, thereby creating a rooted tree ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Parameter `branch_attrs` was added to the `unrooted_copy`, `root_at`, and `root_at_midpoint` methods of `TreeNode`, such that the user can customize which node attributes should be considered as branch attributes and treated accordingly
during the rerooting operation. The default behavior is preserved but is subject ot change in version 0.7.0 ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Parameter `root_name` was added to the `unrooted_copy`, `root_at`, and `root_at_midpoint` methods of `TreeNode`, such that the user can customize (or omit) the name to be given to the root node. The default behavior is preserved but is subject ot change in version 0.7.0 ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).

Bug fixes

* Cleared the internal node references after performing midpoint rooting (`TreeNode.root_at_midpoint`), such that a deep copy of the resulting tree will not result in infinite recursion ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Fixed the Zenodo link in the README to always point to the most recent version ([2078](https://github.com/scikit-bio/scikit-bio/pull/2078)).

Miscellaneous
* Added statsmodels as a dependency of scikit-bio. It replaces some of the from-scratch statistical analyses in scikit-bio, including Welch's t-test (with confidence intervals), Benjamini-Hochberg FDR correction, and Holm-Bonferroni FDR correction ([2049](https://github.com/scikit-bio/scikit-bio/pull/2049), ([#2063](https://github.com/scikit-bio/scikit-bio/pull/2063))).

Deprecated functionality

* Methods `deepcopy` and `unrooted_deepcopy` of `Treenode` are deprecated. Use `copy` and `unrooted_copy` instead.

0.6.1

Features

* NumPy 2.0 is now supported ([2051](https://github.com/scikit-bio/scikit-bio/pull/2051])). We thank rgommers 's advice on this ([#1964](https://github.com/scikit-bio/scikit-bio/issues/1964)).
* Added module `skbio.embedding` to provide support for storing and manipulating embeddings for biological objects, such as protein embeddings outputted from protein language models ([2008](https://github.com/scikit-bio/scikit-bio/pull/2008])).
* Added an efficient sequence alignment path data structure `AlignPath` and its derivative `PairAlignPath` to provide a uniform interface for various multiple and pariwise alignment formats ([2011](https://github.com/scikit-bio/scikit-bio/pull/2011)).
* Added `simpson_d` as an alias for `dominance` (Simpson's dominance index, a.k.a. Simpson's D) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added `inv_simpson` (inverse Simpson index), which is equivalent to `enspie` ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added parameter `exp` to `shannon` to calculate the exponential of Shannon index (i.e., perplexity, or effective number of species) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added parameter `finite` to Simpson's _D_ (`dominance`) and derived metrics (`simpson`, `simpson_e` and `inv_simpson`) to correct for finite samples ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added support for dictionary and pandas DataFrame as input for `TreeNode.from_taxonomy` ([2042](https://github.com/scikit-bio/scikit-bio/pull/2042)).

Performance enhancements

* `subsample_counts` now uses an optimized method from `biom-format` ([2016](https://github.com/scikit-bio/scikit-bio/pull/2016)).
* Improved efficiency of counts matrix and vector validation prior to calculating community diversity metrics ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).

Miscellaneous

* Default logarithm base of Shannon index (`shannon`) was changed from 2 to e. This is to ensure consistency with other Shannon-based metrics (`pielou_e`), and with literature and implementations in the field. Meanwhile, parameter `base` was added to `pielou_e` such that the user can control this behavior ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)). See discussions in [1884](https://github.com/scikit-bio/scikit-bio/issues/1884) and [2014](https://github.com/scikit-bio/scikit-bio/issues/2014).
* Improved treatment of empty communities (i.e., all taxa have zero counts, or there is no taxon) when calculating alpha diversity metrics. Most metrics will return `np.nan` and do not raise a warning due to zero division. Exceptions are metrics that describe observed counts, includng `sobs`, `singles`, `doubles` and `osd`, which return zero ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)). See discussions in [#2014](https://github.com/scikit-bio/scikit-bio/issues/2014).
* Return values of `pielou_e` and `heip_e` were set to 1.0 for one-taxon communities, such that NaN is avoided, while honoring the definition (evenness of taxon abundance(s)) and the rationale (ratio between observed and maximum) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Removed hdmedians as a dependency by porting its `geomedian` function (geometric median) into scikit-bio ([2003](https://github.com/scikit-bio/scikit-bio/pull/2003)).
* Removed 98% warnings issued during the test process ([2045](https://github.com/scikit-bio/scikit-bio/pull/2045) and [#2037](https://github.com/scikit-bio/scikit-bio/pull/2037)).

0.6.0

Performance enhancements

* Launched the new scikit-bio website: https://scikit.bio. The previous domain names _scikit-bio.org_ and _skbio.org_ continue to work and redirect to the new website.
* Migrated the scikit-bio website repo from the `gh-pages` branch of the `scikit-bio` repo to a standalone repo: [`scikit-bio.github.io`](https://github.com/scikit-bio/scikit-bio.github.io).
* Replaced the [Bootstrap theme](https://sphinx-bootstrap-theme.readthedocs.io/en/latest/) with the [PyData theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) for building documentation using Sphinx. Extended this theme to the website. Customized design elements ([#1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Improved the calculation of Fisher's alpha diversity index (`fisher_alpha`). It is now compatible with optimizers in SciPy 1.11+. Edge cases such as all singletons can be handled correctly. Handling of errors and warnings was improved. Documentation was enriched ([1890](https://github.com/scikit-bio/scikit-bio/pull/1890)).
* Allowed `delimiter=None` which represents whitespace of arbitrary length in reading lsmat format matrices ([1912](https://github.com/scikit-bio/scikit-bio/pull/1912)).

Features

* Added biom-format Table import and updated corresponding requirement files ([1907](https://github.com/scikit-bio/scikit-bio/pull/1907)).
* Added biom-format 2.1.0 IO support ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Added `Table` support to `alpha_diversity` and `beta_diversity` drivers ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Implemented a mechanism to automatically build documentation and/or homepage and deploy them to the website ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Added the Benjamini-Hochberg method as an option for FDR correction (in addition to the existing Holm-Bonferroni method) for `ancom` and `dirmult_ttest` ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Added function `dirmult_ttest`, which performs differential abundance test using a Dirichilet multinomial distribution. This function mirrors the method provided by ALDEx2 ([1956](https://github.com/scikit-bio/scikit-bio/pull/1956)).
* Added method `Sequence.to_indices` to convert a sequence into a vector of indices of characters in an alphabet (can be from a substitution matrix) or unique characters observed in the sequence. Supports gap masking and wildcard substitution ([1917](https://github.com/scikit-bio/scikit-bio/pull/1917)).
* Added class `SubstitutionMatrix` to support subsitution matrices for nucleotides, amino acids are more general cases ([1913](https://github.com/scikit-bio/scikit-bio/pull/1913)).
* Added alpha diversity metric `sobs`, which is the observed species richness (S_{obs}) of a sample. `sobs` will replace `observed_otus`, which uses the historical term "OTU". Also added metric `observed_features` to be compatible with the QIIME 2 terminology. All three metrics are equivalent ([1902](https://github.com/scikit-bio/scikit-bio/pull/1902)).
* `beta_diversity` now supports use of Pandas a `DataFrame` index, issue [1808](https://github.com/scikit-bio/scikit-bio/issues/1808).
* Added alpha diversity metric `phydiv`, which is a generalized phylogenetic diversity (PD) framework permitting unrooted or rooted tree, unweighted or weighted by abundance, and an exponent parameter of the weight term ([1893](https://github.com/scikit-bio/scikit-bio/pull/1893)).
* Adopted NumPy's new random generator `np.random.Generator` (see [NEP 19](https://numpy.org/neps/nep-0019-rng-policy.html)) ([#1889](https://github.com/scikit-bio/scikit-bio/pull/1889)).
* SciPy 1.11+ is now supported ([1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).
* Removed IPython as a dependency. Scikit-bio continues to support displaying plots in IPython, but it no longer requires importing IPython functionality ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Made Matplotlib an optional dependency. Scikit-bio no longer requires Matplotlib except for plotting, during which it attempts to import Matplotlib if it is present in the system, and raises an error if not ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Ported the QIIME 2 metadata object into skbio. ([1929](https://github.com/scikit-bio/scikit-bio/pull/1929))
* Python 3.12+ is now supported, thank you actapia ([1930](https://github.com/scikit-bio/scikit-bio/pull/1930))
* Introduced native character conversion ([1971])(https://github.com/scikit-bio/scikit-bio/pull/1971)

Backward-incompatible changes [experimental]

* Beta diversity metric `kulsinski` was removed. This was motivated by that SciPy replaced this distance metric with `kulczynski1` in version 1.11 (see SciPy issue [2009](https://github.com/scipy/scipy/issues/2009)), and that both metrics do not return 0 on two identical vectors ([#1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).

Bug fixes

* Fixed documentation interface of `vlr` and relevant functions ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Fixed broken link in documentation of Simpson's evenness index. See issue [1923](https://github.com/scikit-bio/scikit-bio/issues/1923).
* Safely handle `Sequence.iter_kmers` where `k` is greater than the sequence length ([1723](https://github.com/scikit-bio/scikit-bio/issues/1723))
* Re-enabled OpenMP support, which has been mistakenly disabled in 0.5.8 ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* `permanova` and `permdist` operate on a `DistanceMatrix` and a grouping object. Element IDs must be synchronized to compare correct sets of pairwise distances. This failed in case the grouping was provided as a `pandas.Series`, because it was interpreted as an ordered `list` and indices were ignored (see issue [1877](https://github.com/scikit-bio/scikit-bio/issues/1877) for an example). Note: `pandas.DataFrame` was handled correctly. This behavior has been fixed with PR [#1879](https://github.com/scikit-bio/scikit-bio/pull/1879)
* Fixed slicing for `TabularMSALoc` on Python 3.12. See issue [1926](https://github.com/scikit-bio/scikit-bio/issues/1926).

Miscellaneous

* Replaced the historical term "OTU" with the more generic term "taxon" (plural: "taxa"). As a consequence, the parameter "otu_ids" in phylogenetic alpha and beta diversity metrics was replaced by "taxa". Meanwhile, the old parameter "otu_ids" is still kept as an alias of "taxa" for backward compatibility. However it will be removed in a future release.
* Revised contributor's guidelines.
* Renamed function `multiplicative_replacement` as `multi_replace` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Renamed parameter `multiple_comparisons_correction` as `p_adjust` of function `ancom` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Enabled code coverage reporting via Codecov. See [1954](https://github.com/scikit-bio/scikit-bio/pull/1954).
* Renamed the default branch from "master" to "main". See [1953](https://github.com/scikit-bio/scikit-bio/pull/1953).
* Enabled subclassing of DNA, RNA and Protein classes to allow secondary development.
* Dropped support for NumPy < 1.17.0 in order to utilize the new random generator.
* Use CYTHON by default during build ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* Implemented augmented assignments proposed in issue [1789](https://github.com/scikit-bio/scikit-bio/issues/1789)
* Incorporated Ruff's formatting and linting via pre-commit hooks and GitHub Actions. See PR [1924](https://github.com/scikit-bio/scikit-bio/pull/1924).
* Improved docstrings for functions accross the entire codebase. See [1933](https://github.com/scikit-bio/scikit-bio/pull/1933) and [#1940](https://github.com/scikit-bio/scikit-bio/pull/1940)
* Removed API lifecycle decorators in favor of deprecation warnings. See [1916](https://github.com/scikit-bio/scikit-bio/issues/1916)

0.5.9

Features

* Adding Variance log ratio estimators in `skbio.stats.composition.vlr` and `skbio.stats.composition.pairwise_vlr` ([1803](https://github.com/scikit-bio/scikit-bio/pull/1803))
* Added `skbio.stats.composition.tree_basis` to construct ILR bases from `TreeNode` objects. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))
* `IntervalMetadata.query` now defaults to obtaining all results, see [1817](https://github.com/scikit-bio/scikit-bio/issues/1817).

Backward-incompatible changes [experimental]
* With the introduction of the `tree_basis` object, the ILR bases are now represented in log-odds coordinates rather than in probabilities to minimize issues with numerical stability. Furthermore, the `ilr` and `ilr_inv` functions now takes the `basis` input parameter in terms of log-odds coordinates. This affects the `skbio.stats.composition.sbp_basis` as well. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))

Important

* Complex multiple axis indexing operations with `TabularMSA` have been removed from testing due to incompatibilities with modern versions of Pandas. ([1851](https://github.com/scikit-bio/scikit-bio/pull/1851))
* Pinning `scipy <= 1.10.1` ([1851](https://github.com/scikit-bio/scikit-bio/pull/1867))

Bug fixes

* Fixed a bug that caused build failure on the ARM64 microarchitecture due to floating-point number handling. ([1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Never let the Gini index go below 0.0, see [1844](https://github.com/scikit-bio/scikit-bio/issue/1844).
* Fixed bug [1847](https://github.com/scikit-bio/scikit-bio/issues/1847) in which the edge from the root was inadvertantly included in the calculation for `descending_branch_length`

Miscellaneous

* Replaced dependencies `CacheControl` and `lockfile` with `requests` to avoid a dependency inconsistency issue of the former. (See [1863](https://github.com/scikit-bio/scikit-bio/pull/1863), merged in [#1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Updated installation instructions for developers in `CONTRIBUTING.md` ([1860](https://github.com/scikit-bio/scikit-bio/pull/1860))

0.5.8

Features

* Added NCBI taxonomy database dump format (`taxdump`) ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* Added `TreeNode.from_taxdump` for converting taxdump into a tree ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* scikit-learn has been removed as a dependency. This was a fairly heavy-weight dependency that was providing minor functionality to scikit-bio. The critical components have been implemented in scikit-bio directly, and the non-criticial components are listed under "Backward-incompatible changes [experimental]".
* Python 3.11 is now supported.

Backward-incompatible changes [experimental]
* With the removal of the scikit-learn dependency, three beta diversity metric names can no longer be specified. These are `wminkowski`, `nan_euclidean`, and `haversine`. On testing, `wminkowski` and `haversine` did not work through `skbio.diversity.beta_diversity` (or `sklearn.metrics.pairwise_distances`). The former was deprecated in favor of calling `minkowski` with a vector of weights provided as kwarg `w` (example below), and the latter does not work with data of this shape. `nan_euclidean` can still be accessed fron scikit-learn directly if needed, if a user installs scikit-learn in their environment (example below).


counts = [[23, 64, 14, 0, 0, 3, 1],
[0, 3, 35, 42, 0, 12, 1],
[0, 5, 5, 0, 40, 40, 0],
[44, 35, 9, 0, 1, 0, 0],
[0, 2, 8, 0, 35, 45, 1],
[0, 0, 25, 35, 0, 19, 0],
[88, 31, 0, 5, 5, 5, 5],
[44, 39, 0, 0, 0, 0, 0]]

new mechanism of accessing wminkowski
from skbio.diversity import beta_diversity
beta_diversity("minkowski", counts, w=[1,1,1,1,1,1,2])

accessing nan_euclidean through scikit-learn directly
import skbio
from sklearn.metrics import pairwise_distances
sklearn_dm = pairwise_distances(counts, metric="nan_euclidean")
skbio_dm = skbio.DistanceMatrix(sklearn_dm)


Deprecated functionality [experimental]
* `skbio.alignment.local_pairwise_align_ssw` has been deprecated ([1814](https://github.com/scikit-bio/scikit-bio/issues/1814)) and will be removed or replaced in scikit-bio 0.6.0.

Bug fixes
* Use `oldest-supported-numpy` as build dependency. This fixes problems with environments that use an older version of numpy than the one used to build scikit-bio ([1813](https://github.com/scikit-bio/scikit-bio/pull/1813)).

0.5.7

Features

* Introduce support for Python 3.10 ([1801](https://github.com/scikit-bio/scikit-bio/pull/1801)).
* Tentative support for Apple M1 ([1709](https://github.com/scikit-bio/scikit-bio/pull/1709)).
* Added support for reading and writing a binary distance matrix object format. ([1716](https://github.com/scikit-bio/scikit-bio/pull/1716))
* Added support for `np.float32` with `DissimilarityMatrix` objects.
* Added support for method and number_of_dimensions to permdisp reducing the runtime by 100x at 4000 samples, [issue 1769](https://github.com/scikit-bio/scikit-bio/pull/1769).
* OrdinationResults object is now accepted as input for permdisp.

Performance enhancements

* Avoid an implicit data copy on construction of `DissimilarityMatrix` objects.
* Avoid validation on copy of `DissimilarityMatrix` and `DistanceMatrix` objects, see [PR 1747](https://github.com/scikit-bio/scikit-bio/pull/1747)
* Use an optimized version of symmetry check in DistanceMatrix, see [PR 1747](https://github.com/scikit-bio/scikit-bio/pull/1747)
* Avoid performing filtering when ids are identical, see [PR 1752](https://github.com/scikit-bio/scikit-bio/pull/1752)
* center_distance_matrix has been re-implemented in cython for both speed and memory use. Indirectly speeds up pcoa [PR 1749](https://github.com/scikit-bio/scikit-bio/pull/1749)
* Use a memory-optimized version of permute in DistanceMatrix, see [PR 1756](https://github.com/scikit-bio/scikit-bio/pull/1756).
* Refactor pearson and spearman skbio.stats.distance.mantel implementations to drastically improve memory locality. Also cache intermediate results that are invariant across permutations, see [PR 1756](https://github.com/scikit-bio/scikit-bio/pull/1756).
* Refactor permanova to remove intermediate buffers and cythonize the internals, see [PR 1768](https://github.com/scikit-bio/scikit-bio/pull/1768).

Bug fixes

* Fix windows and 32bit incompatibility in `unweighted_unifrac`.

Miscellaneous

* Python 3.6 has been removed from our testing matrix.
* Specify build dependencies in pyproject.toml. This allows the package to be installed without having to first manually install numpy.
* Update hdmedians package to a version which doesn't require an initial manual numpy install.
* Now buildable on non-x86 platforms due to use of the [SIMD Everywhere](https://github.com/simd-everywhere/simde) library.
* Regenerate Cython wrapper by default to avoid incompatibilities with installed CPython.
* Update documentation for the `skbio.stats.composition.ancom` function. ([1741](https://github.com/scikit-bio/scikit-bio/pull/1741))

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.