Scikit-bio

Latest version: v0.6.3

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 5

0.6.3

Features

* Python 3.13+ is now supported ([2146](https://github.com/scikit-bio/scikit-bio/pull/2146)).
* Added Balanced Minimum Evolution (BME) function for phylogenetic reconstruction and `balanced` option for NNI ([2105](https://github.com/scikit-bio/scikit-bio/pull/2105) and [#2169](https://github.com/scikit-bio/scikit-bio/pull/2169)).
* Added functions `rf_dists`, `wrf_dists` and `path_dists` under `skbio.tree` to calculate multiple pariwise distance metrics among an arbitrary number of trees. They correspond to `TreeNode` methods `compare_rfd`, `compare_wrfd` and `compare_cophenet` for two trees ([2166](https://github.com/scikit-bio/scikit-bio/pull/2166)).
* Added `height` and `depth` methods under `TreeNode` to calculate the height and depth of a given node.
* Added `TreeNode.compare_wrfd` to calculate the weighted Robinson-Foulds distance or its variants between two trees ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Wrapped UPGMA and WPGMA from SciPy's linkage method ([2094](https://github.com/scikit-bio/scikit-bio/pull/2094)).
* Added `TreeNode` methods: `bipart`, `biparts` and `compare_biparts` to encode and compare bipartitions in a tree ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Added `TreeNode.has_caches` to check if a tree has caches ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Added `TreeNode.is_bifurcating` to check if a tree is bifurcating (i.e., binary) ([2117](https://github.com/scikit-bio/scikit-bio/pull/2117)).
* Added support for Python's `pathlib` module in the IO system ([2119](https://github.com/scikit-bio/scikit-bio/pull/2119)).
* Added `TreeNode.path` to return a list of nodes representing the path from one node to another ([2131](https://github.com/scikit-bio/scikit-bio/pull/2131)).
* Exposed `vectorize_counts_and_tree` function from the `diversity` module to allow use for improving ML accuracy in downstream pipelines ([2173](https://github.com/scikit-bio/scikit-bio/pull/2173))

Performance enhancements

* Significantly improved the performance of the neighbor joining (NJ) algorithm (`nj`) ([2147](https://github.com/scikit-bio/scikit-bio/pull/2147)) and the greedy minimum evolution (GME) algorithm (`gme`) for phylogenetic reconstruction, and the NNI algorithm for tree rearrangement ([#2169](https://github.com/scikit-bio/scikit-bio/pull/2169)).
* Significantly improved the performance of `TreeNode.cophenet` (renamed from `tip_tip_distances`) for computing a patristic distance matrix among all or selected tips of a tree ([2152](https://github.com/scikit-bio/scikit-bio/pull/2152)).
* Supported Robinson-Foulds distance calculation (`TreeNode.compare_rfd`) based on bipartitions (equivalent to `compare_biparts`). This is automatically enabled when the input tree is unrooted. Otherwise the calculation is still based on subsets (equivalent to `compare_subsets`). The user can override this behavior using the `rooted` parameter ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Re-wrote the underlying algorithm of `TreeNode.compare_subsets` because it is equivalent to the Robinson-Foulds distance on rooted trees. Added parameter `proportion`. Renamed parameter `exclude_absent_taxa` as `shared_only` ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Added parameter `include_self` to `TreeNode.subset`. Added parameters `within`, `include_full` and `include_tips` to `TreeNode.subsets` ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Improved the performance and customizability of `TreeNode.total_length` (renamed from `descending_branch_length`). Added parameters `include_stem` and `include_self`.
* Improved the performance of `TreeNode.lca` ([2132](https://github.com/scikit-bio/scikit-bio/pull/2132)).
* Improved the performance of `TreeNode` methods: `ancestors`, `siblings`, and `neighbors` ([2133](https://github.com/scikit-bio/scikit-bio/pull/2133), [#2135](https://github.com/scikit-bio/scikit-bio/pull/2135)).
* Improved the performance of tree traversal algorithms ([2093](https://github.com/scikit-bio/scikit-bio/pull/2093)).
* Improved the performance of tree copying ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Further improved the caching mechanism of `TreeNode`. Specifically: 1. Node attribute caches are only registered at the root node, which improves memory efficiency. 2. Method `clear_caches` can be customized to clear node attribute and/or lookup caches, or specified attribute caches ([2099](https://github.com/scikit-bio/scikit-bio/pull/2099)). 3. Added parameter `uncache` to multiple methods that involves tree manipulation. Default is True. When one knows that caches are not present or relevant, one may set this parameter as False to skip cache clearing to significantly improve performance ([#2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Expanded the functionality of `TreeNode.cache_attr`. It can now take a custom function to combine children and self attributes. This makes it possible to cache multiple useful clade properties such as node count and total branch length. Also enriched the method's docstring to provide multiple examples of caching clade properties ([2099](https://github.com/scikit-bio/scikit-bio/pull/2099)).
* Added parameter `inplace` to methods `shear`, `root_at`, `root_at_midpoint` and `root_by_outgroup` of `TreeNode` to enable manipulating the tree in place (True), which is more efficient that making a manipulated copy of the tree (False, default) ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* `TreeNode.extend` can accept any iterable type of nodes as input ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Added parameter `strict` to `TreeNode.shear` ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Added parameter `exclude_attrs` to `TreeNode.unrooted_copy` ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Added support for legacy random generator to `get_rng`, such that outputs of scikit-bio functions become reproducible with code that starts with `np.random.seed` or uses `RandomState` ([2130](https://github.com/scikit-bio/scikit-bio/pull/2130)).
* Allowed `shuffle` and `compare_cophenet` (renamed from `compare_tip_distances`) of `TreeNode` to accept a random seed or random generator to generate the shuffling function, which ensures output reproducibility ([2118](https://github.com/scikit-bio/scikit-bio/pull/2118)).
* Replaced `accumulate_to_ancestor` with `depth` under `TreeNode`. The latter has expanded functionality which covers the default behavior of the former.
* Added beta diversity metric `jensenshannon`, which calculates Jensen-Shannon distance. Thank quliping for suggesting this in [2125](https://github.com/scikit-bio/scikit-bio/pull/2125).
* Added parameter `include_self` to `TreeNode.ancestors` to optionally include the initial node in the path (default: False) ([2135](https://github.com/scikit-bio/scikit-bio/pull/2135)).
* Added parameter `seed` to functions `pcoa`, `anosim`, `permanova`, `permdisp`, `randdm`, `lladser_pe`, `lladser_ci`, `isubsample`, `subsample_power`, `subsample_paired_power`, `paired_subsamples` and `hommola_cospeciation` to accept a random seed or random generator to ensure output reproducibility ([2120](https://github.com/scikit-bio/scikit-bio/pull/2120) and [#2129](https://github.com/scikit-bio/scikit-bio/pull/2129)).
* Made the `IORegistry` sniffer only attempt file formats which are logical given a specific object, thus improving reading efficiency.
* Allowed the `number_of_dimensions` parameter in the function `pcoa` to accept float values between 0 and 1 to capture fractional cumulative variance.

Bug fixes

* Fixed a bug in `TreeNode.find` which returns the input node object even if it's not in the current tree ([2153](https://github.com/scikit-bio/scikit-bio/pull/2153)).
* Fixed a bug in `TreeNode.get_max_distance` which returns tip names instead of tip instances when there are single-child nodes in the tree ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Fixed an issue in `subsets` and `cophenet` (renamed from `tip_tip_distances`) of `TreeNode` which leaves remnant attributes at each node after execution ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Fixed a bug in `TreeNode.compare_rfd` which raises an error if taxa of the two trees are not subsets of each other ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Fixed a bug in `TreeNode.compare_subsets` which includes the full set (not a subset) of shared taxa between two trees if a basal clade of either tree consists of entirely unshared taxa ([2144](https://github.com/scikit-bio/scikit-bio/pull/2144)).
* Fixed a bug in `TreeNode.lca` which returns the parent of input node X instead of X itself if X is ancestral to other input nodes ([2132](https://github.com/scikit-bio/scikit-bio/pull/2132)).
* Fixed a bug in `TreeNode.find_all` which does not look for other nodes with the same name if a `TreeNode` instance is provided, as in contrast to what the documentation claims ([2099](https://github.com/scikit-bio/scikit-bio/pull/2099)).
* Fixed a bug in `skbio.io.format.embed` which was not correctly updating the idptr sizing. ([2100](https://github.com/scikit-bio/scikit-bio/pull/2100)).
* Fixed a bug in `TreeNode.unrooted_move` which does not respect specified branch attributes ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Fixed a bug in `skbio.diversity.get_beta_diversity_metrics` which does not display metrics other than UniFrac ([2126](https://github.com/scikit-bio/scikit-bio/pull/2126)).
* Raises an error when beta diversity metric `mahalanobis` is called but sample number is smaller than or equal to feature number in the data. Thank quliping for noting this in [2125](https://github.com/scikit-bio/scikit-bio/pull/2125).
* Fixed a bug in `io.format.fasta` that improperly handled sequences containing spaces. ([2156](https://github.com/scikit-bio/scikit-bio/pull/2156))

Miscellaneous

* Added a parameter `warn_neg_eigval` to `pcoa` and `permdisp` to control when to raise a warning when negative eigenvalues are encountered. The default setting is more relaxed than the previous behavior, therefore warnings will not be raised when the negative eigenvalues are small in magnitude, which is the case in many real-world scenarios [2154](https://github.com/scikit-bio/scikit-bio/pull/2154).
* Refactored `dirmult_ttest` to use a separate function for fitting data to Dirichlet-multinomial distribution ([2113](https://github.com/scikit-bio/scikit-bio/pull/2113))
* Remodeled documentation. Special methods (previously referred to as built-in methods) and inherited methods of a class no longer have separate stub pages. This significantly reduced the total number of webpages in the documentation ([2110](https://github.com/scikit-bio/scikit-bio/pull/2110)).
* Renamed `invalidate_caches` as `clear_caches` under `TreeNode`, because the caches are indeed deleted rather than marked as obsolete. The old name is preserved as an alias ([2099](https://github.com/scikit-bio/scikit-bio/pull/2099)).
* Renamed `remove_deleted` as `remove_by_func` under `TreeNode`. The old name is preserved as an alias ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).
* Renamed `descending_branch_length` as `total_length` under `TreeNode`. The old name is preserved as an alias.
* Under `TreeNode`, renamed `get_max_distance` as `maxdist`. Renamed `tip_tip_distances` as `cophenet`. Renamed `compare_tip_distances` as `compare_cophenet`. The new names are consistent with SciPy's relevant functions and the main body of the literature. The old names are preserved as aliases.

Deprecated functionality

* Method `TreeNode.subtree` is deprecated. It will become a private member in version 0.7.0 ([2103](https://github.com/scikit-bio/scikit-bio/pull/2103)).

Backward-incompatible changes

* Dropped support for Python 3.8 as it has reached end-of-life (EOL). scikit-bio may still be installed under Python 3.8 and will likely work, but the development team no longer guarantee that all functionality will work as intended.
* Removed `skbio.util.SkbioWarning`. Now there are no specific warnings to scikit-bio.
* Removed `skbio.util.EfficiencyWarning`. Previously it was only used in the Python implementations of pairwise sequence alignment algorithms. The new code replaced it with `PendingDeprecationWarning`.
* Removed `skbio.util.RepresentationWarning`. Previously it was only used in `TreeNode.tip_tip_distances` when a node has no branch length. The new code removed this behavior ([2152](https://github.com/scikit-bio/scikit-bio/pull/2152)).

0.6.2

Features

* Added Greedy Minimum Evolution (GME) function for phylogenetic reconstruction ([2087](https://github.com/scikit-bio/scikit-bio/pull/2087)).
* Added support for Microsoft Windows operating system. ([2071](https://github.com/scikit-bio/scikit-bio/pull/2071), [#2068](https://github.com/scikit-bio/scikit-bio/pull/2068),
[2067](https://github.com/scikit-bio/scikit-bio/pull/2067), [#2061](https://github.com/scikit-bio/scikit-bio/pull/2061), [#2046](https://github.com/scikit-bio/scikit-bio/pull/2046),
[2040](https://github.com/scikit-bio/scikit-bio/pull/2040), [#2036](https://github.com/scikit-bio/scikit-bio/pull/2036), [#2034](https://github.com/scikit-bio/scikit-bio/pull/2034),
[2032](https://github.com/scikit-bio/scikit-bio/pull/2032), [#2005](https://github.com/scikit-bio/scikit-bio/pull/2005))
* Added alpha diversity metrics: Hill number (`hill`), Renyi entropy (`renyi`) and Tsallis entropy (`tsallis`) ([2074](https://github.com/scikit-bio/scikit-bio/pull/2074)).
* Added `rename` method for `OrdinationResults` and `DissimilarityMatrix` classes ([2027](https://github.com/scikit-bio/scikit-bio/pull/2027), [#2085](https://github.com/scikit-bio/scikit-bio/pull/2085)).
* Added `nni` function for phylogenetic tree rearrangement using nearest neighbor interchange (NNI) ([2050](https://github.com/scikit-bio/scikit-bio/pull/2050)).
* Added method `TreeNode.unrooted_move`, which resembles `TreeNode.unrooted_copy` but rearranges the tree in place, thus avoid making copies of the nodes ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.root_by_outgroup`, which reroots a tree according to a given outgroup ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.unroot`, which converts a rooted tree into unrooted by trifucating its root ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Added method `TreeNode.insert`, which inserts a node into the branch connecting self and its parent ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).

Performance enhancements

* The time and memory efficiency of `TreeNode` has been significantly improved by making its caching mechanism lazy ([2082](https://github.com/scikit-bio/scikit-bio/pull/2082)).
* `Treenode.copy` and `TreeNode.unrooted_copy` can now perform shallow copy of a tree in addition to deep copy.
* `TreeNode.unrooted_copy` can now copy all attributes of the nodes, in addition to name and length ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Paremter `above` was added to `TreeNode.root_at`, such that the user can root the tree within the branch connecting the given node and its parent, thereby creating a rooted tree ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Parameter `branch_attrs` was added to the `unrooted_copy`, `root_at`, and `root_at_midpoint` methods of `TreeNode`, such that the user can customize which node attributes should be considered as branch attributes and treated accordingly during the rerooting operation. The default behavior is preserved but is subject ot change in version 0.7.0 ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Parameter `root_name` was added to the `unrooted_copy`, `root_at`, and `root_at_midpoint` methods of `TreeNode`, such that the user can customize (or omit) the name to be given to the root node. The default behavior is preserved but is subject ot change in version 0.7.0 ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).

Bug fixes

* Cleared the internal node references after performing midpoint rooting (`TreeNode.root_at_midpoint`), such that a deep copy of the resulting tree will not result in infinite recursion ([2073](https://github.com/scikit-bio/scikit-bio/pull/2073)).
* Fixed the Zenodo link in the README to always point to the most recent version ([2078](https://github.com/scikit-bio/scikit-bio/pull/2078)).

Miscellaneous

* Added statsmodels as a dependency of scikit-bio. It replaces some of the from-scratch statistical analyses in scikit-bio, including Welch's t-test (with confidence intervals), Benjamini-Hochberg FDR correction, and Holm-Bonferroni FDR correction ([2049](https://github.com/scikit-bio/scikit-bio/pull/2049), ([#2063](https://github.com/scikit-bio/scikit-bio/pull/2063))).

Deprecated functionality

* Methods `deepcopy` and `unrooted_deepcopy` of `Treenode` are deprecated. Use `copy` and `unrooted_copy` instead.

0.6.1

Features

* NumPy 2.0 is now supported ([2051](https://github.com/scikit-bio/scikit-bio/pull/2051])). We thank rgommers 's advice on this ([#1964](https://github.com/scikit-bio/scikit-bio/issues/1964)).
* Added module `skbio.embedding` to provide support for storing and manipulating embeddings for biological objects, such as protein embeddings outputted from protein language models ([2008](https://github.com/scikit-bio/scikit-bio/pull/2008])).
* Added an efficient sequence alignment path data structure `AlignPath` and its derivative `PairAlignPath` to provide a uniform interface for various multiple and pariwise alignment formats ([2011](https://github.com/scikit-bio/scikit-bio/pull/2011)).
* Added `simpson_d` as an alias for `dominance` (Simpson's dominance index, a.k.a. Simpson's D) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added `inv_simpson` (inverse Simpson index), which is equivalent to `enspie` ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added parameter `exp` to `shannon` to calculate the exponential of Shannon index (i.e., perplexity, or effective number of species) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added parameter `finite` to Simpson's _D_ (`dominance`) and derived metrics (`simpson`, `simpson_e` and `inv_simpson`) to correct for finite samples ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Added support for dictionary and pandas DataFrame as input for `TreeNode.from_taxonomy` ([2042](https://github.com/scikit-bio/scikit-bio/pull/2042)).

Performance enhancements

* `subsample_counts` now uses an optimized method from `biom-format` ([2016](https://github.com/scikit-bio/scikit-bio/pull/2016)).
* Improved efficiency of counts matrix and vector validation prior to calculating community diversity metrics ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).

Miscellaneous

* Default logarithm base of Shannon index (`shannon`) was changed from 2 to e. This is to ensure consistency with other Shannon-based metrics (`pielou_e`), and with literature and implementations in the field. Meanwhile, parameter `base` was added to `pielou_e` such that the user can control this behavior ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)). See discussions in [1884](https://github.com/scikit-bio/scikit-bio/issues/1884) and [2014](https://github.com/scikit-bio/scikit-bio/issues/2014).
* Improved treatment of empty communities (i.e., all taxa have zero counts, or there is no taxon) when calculating alpha diversity metrics. Most metrics will return `np.nan` and do not raise a warning due to zero division. Exceptions are metrics that describe observed counts, includng `sobs`, `singles`, `doubles` and `osd`, which return zero ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)). See discussions in [#2014](https://github.com/scikit-bio/scikit-bio/issues/2014).
* Return values of `pielou_e` and `heip_e` were set to 1.0 for one-taxon communities, such that NaN is avoided, while honoring the definition (evenness of taxon abundance(s)) and the rationale (ratio between observed and maximum) ([2024](https://github.com/scikit-bio/scikit-bio/pull/2024)).
* Removed hdmedians as a dependency by porting its `geomedian` function (geometric median) into scikit-bio ([2003](https://github.com/scikit-bio/scikit-bio/pull/2003)).
* Removed 98% warnings issued during the test process ([2045](https://github.com/scikit-bio/scikit-bio/pull/2045) and [#2037](https://github.com/scikit-bio/scikit-bio/pull/2037)).

0.6.0

Performance enhancements

* Launched the new scikit-bio website: https://scikit.bio. The previous domain names _scikit-bio.org_ and _skbio.org_ continue to work and redirect to the new website.
* Migrated the scikit-bio website repo from the `gh-pages` branch of the `scikit-bio` repo to a standalone repo: [`scikit-bio.github.io`](https://github.com/scikit-bio/scikit-bio.github.io).
* Replaced the [Bootstrap theme](https://sphinx-bootstrap-theme.readthedocs.io/en/latest/) with the [PyData theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) for building documentation using Sphinx. Extended this theme to the website. Customized design elements ([#1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Improved the calculation of Fisher's alpha diversity index (`fisher_alpha`). It is now compatible with optimizers in SciPy 1.11+. Edge cases such as all singletons can be handled correctly. Handling of errors and warnings was improved. Documentation was enriched ([1890](https://github.com/scikit-bio/scikit-bio/pull/1890)).
* Allowed `delimiter=None` which represents whitespace of arbitrary length in reading lsmat format matrices ([1912](https://github.com/scikit-bio/scikit-bio/pull/1912)).

Features

* Added biom-format Table import and updated corresponding requirement files ([1907](https://github.com/scikit-bio/scikit-bio/pull/1907)).
* Added biom-format 2.1.0 IO support ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Added `Table` support to `alpha_diversity` and `beta_diversity` drivers ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Implemented a mechanism to automatically build documentation and/or homepage and deploy them to the website ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Added the Benjamini-Hochberg method as an option for FDR correction (in addition to the existing Holm-Bonferroni method) for `ancom` and `dirmult_ttest` ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Added function `dirmult_ttest`, which performs differential abundance test using a Dirichilet multinomial distribution. This function mirrors the method provided by ALDEx2 ([1956](https://github.com/scikit-bio/scikit-bio/pull/1956)).
* Added method `Sequence.to_indices` to convert a sequence into a vector of indices of characters in an alphabet (can be from a substitution matrix) or unique characters observed in the sequence. Supports gap masking and wildcard substitution ([1917](https://github.com/scikit-bio/scikit-bio/pull/1917)).
* Added class `SubstitutionMatrix` to support subsitution matrices for nucleotides, amino acids are more general cases ([1913](https://github.com/scikit-bio/scikit-bio/pull/1913)).
* Added alpha diversity metric `sobs`, which is the observed species richness (S_{obs}) of a sample. `sobs` will replace `observed_otus`, which uses the historical term "OTU". Also added metric `observed_features` to be compatible with the QIIME 2 terminology. All three metrics are equivalent ([1902](https://github.com/scikit-bio/scikit-bio/pull/1902)).
* `beta_diversity` now supports use of Pandas a `DataFrame` index, issue [1808](https://github.com/scikit-bio/scikit-bio/issues/1808).
* Added alpha diversity metric `phydiv`, which is a generalized phylogenetic diversity (PD) framework permitting unrooted or rooted tree, unweighted or weighted by abundance, and an exponent parameter of the weight term ([1893](https://github.com/scikit-bio/scikit-bio/pull/1893)).
* Adopted NumPy's new random generator `np.random.Generator` (see [NEP 19](https://numpy.org/neps/nep-0019-rng-policy.html)) ([#1889](https://github.com/scikit-bio/scikit-bio/pull/1889)).
* SciPy 1.11+ is now supported ([1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).
* Removed IPython as a dependency. Scikit-bio continues to support displaying plots in IPython, but it no longer requires importing IPython functionality ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Made Matplotlib an optional dependency. Scikit-bio no longer requires Matplotlib except for plotting, during which it attempts to import Matplotlib if it is present in the system, and raises an error if not ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Ported the QIIME 2 metadata object into skbio. ([1929](https://github.com/scikit-bio/scikit-bio/pull/1929))
* Python 3.12+ is now supported, thank you actapia ([1930](https://github.com/scikit-bio/scikit-bio/pull/1930))
* Introduced native character conversion ([1971])(https://github.com/scikit-bio/scikit-bio/pull/1971)

Backward-incompatible changes [experimental]

* Beta diversity metric `kulsinski` was removed. This was motivated by that SciPy replaced this distance metric with `kulczynski1` in version 1.11 (see SciPy issue [2009](https://github.com/scipy/scipy/issues/2009)), and that both metrics do not return 0 on two identical vectors ([#1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).

Bug fixes

* Fixed documentation interface of `vlr` and relevant functions ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Fixed broken link in documentation of Simpson's evenness index. See issue [1923](https://github.com/scikit-bio/scikit-bio/issues/1923).
* Safely handle `Sequence.iter_kmers` where `k` is greater than the sequence length ([1723](https://github.com/scikit-bio/scikit-bio/issues/1723))
* Re-enabled OpenMP support, which has been mistakenly disabled in 0.5.8 ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* `permanova` and `permdist` operate on a `DistanceMatrix` and a grouping object. Element IDs must be synchronized to compare correct sets of pairwise distances. This failed in case the grouping was provided as a `pandas.Series`, because it was interpreted as an ordered `list` and indices were ignored (see issue [1877](https://github.com/scikit-bio/scikit-bio/issues/1877) for an example). Note: `pandas.DataFrame` was handled correctly. This behavior has been fixed with PR [#1879](https://github.com/scikit-bio/scikit-bio/pull/1879)
* Fixed slicing for `TabularMSALoc` on Python 3.12. See issue [1926](https://github.com/scikit-bio/scikit-bio/issues/1926).

Miscellaneous

* Replaced the historical term "OTU" with the more generic term "taxon" (plural: "taxa"). As a consequence, the parameter "otu_ids" in phylogenetic alpha and beta diversity metrics was replaced by "taxa". Meanwhile, the old parameter "otu_ids" is still kept as an alias of "taxa" for backward compatibility. However it will be removed in a future release.
* Revised contributor's guidelines.
* Renamed function `multiplicative_replacement` as `multi_replace` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Renamed parameter `multiple_comparisons_correction` as `p_adjust` of function `ancom` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Enabled code coverage reporting via Codecov. See [1954](https://github.com/scikit-bio/scikit-bio/pull/1954).
* Renamed the default branch from "master" to "main". See [1953](https://github.com/scikit-bio/scikit-bio/pull/1953).
* Enabled subclassing of DNA, RNA and Protein classes to allow secondary development.
* Dropped support for NumPy < 1.17.0 in order to utilize the new random generator.
* Use CYTHON by default during build ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* Implemented augmented assignments proposed in issue [1789](https://github.com/scikit-bio/scikit-bio/issues/1789)
* Incorporated Ruff's formatting and linting via pre-commit hooks and GitHub Actions. See PR [1924](https://github.com/scikit-bio/scikit-bio/pull/1924).
* Improved docstrings for functions accross the entire codebase. See [1933](https://github.com/scikit-bio/scikit-bio/pull/1933) and [#1940](https://github.com/scikit-bio/scikit-bio/pull/1940)
* Removed API lifecycle decorators in favor of deprecation warnings. See [1916](https://github.com/scikit-bio/scikit-bio/issues/1916)

0.5.9

Features

* Adding Variance log ratio estimators in `skbio.stats.composition.vlr` and `skbio.stats.composition.pairwise_vlr` ([1803](https://github.com/scikit-bio/scikit-bio/pull/1803))
* Added `skbio.stats.composition.tree_basis` to construct ILR bases from `TreeNode` objects. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))
* `IntervalMetadata.query` now defaults to obtaining all results, see [1817](https://github.com/scikit-bio/scikit-bio/issues/1817).

Backward-incompatible changes [experimental]
* With the introduction of the `tree_basis` object, the ILR bases are now represented in log-odds coordinates rather than in probabilities to minimize issues with numerical stability. Furthermore, the `ilr` and `ilr_inv` functions now takes the `basis` input parameter in terms of log-odds coordinates. This affects the `skbio.stats.composition.sbp_basis` as well. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))

Important

* Complex multiple axis indexing operations with `TabularMSA` have been removed from testing due to incompatibilities with modern versions of Pandas. ([1851](https://github.com/scikit-bio/scikit-bio/pull/1851))
* Pinning `scipy <= 1.10.1` ([1851](https://github.com/scikit-bio/scikit-bio/pull/1867))

Bug fixes

* Fixed a bug that caused build failure on the ARM64 microarchitecture due to floating-point number handling. ([1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Never let the Gini index go below 0.0, see [1844](https://github.com/scikit-bio/scikit-bio/issue/1844).
* Fixed bug [1847](https://github.com/scikit-bio/scikit-bio/issues/1847) in which the edge from the root was inadvertantly included in the calculation for `descending_branch_length`

Miscellaneous

* Replaced dependencies `CacheControl` and `lockfile` with `requests` to avoid a dependency inconsistency issue of the former. (See [1863](https://github.com/scikit-bio/scikit-bio/pull/1863), merged in [#1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Updated installation instructions for developers in `CONTRIBUTING.md` ([1860](https://github.com/scikit-bio/scikit-bio/pull/1860))

0.5.8

Features

* Added NCBI taxonomy database dump format (`taxdump`) ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* Added `TreeNode.from_taxdump` for converting taxdump into a tree ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* scikit-learn has been removed as a dependency. This was a fairly heavy-weight dependency that was providing minor functionality to scikit-bio. The critical components have been implemented in scikit-bio directly, and the non-criticial components are listed under "Backward-incompatible changes [experimental]".
* Python 3.11 is now supported.

Backward-incompatible changes [experimental]
* With the removal of the scikit-learn dependency, three beta diversity metric names can no longer be specified. These are `wminkowski`, `nan_euclidean`, and `haversine`. On testing, `wminkowski` and `haversine` did not work through `skbio.diversity.beta_diversity` (or `sklearn.metrics.pairwise_distances`). The former was deprecated in favor of calling `minkowski` with a vector of weights provided as kwarg `w` (example below), and the latter does not work with data of this shape. `nan_euclidean` can still be accessed fron scikit-learn directly if needed, if a user installs scikit-learn in their environment (example below).

counts = [[23, 64, 14, 0, 0, 3, 1],
[0, 3, 35, 42, 0, 12, 1],
[0, 5, 5, 0, 40, 40, 0],
[44, 35, 9, 0, 1, 0, 0],
[0, 2, 8, 0, 35, 45, 1],
[0, 0, 25, 35, 0, 19, 0],
[88, 31, 0, 5, 5, 5, 5],
[44, 39, 0, 0, 0, 0, 0]]

new mechanism of accessing wminkowski
from skbio.diversity import beta_diversity
beta_diversity("minkowski", counts, w=[1,1,1,1,1,1,2])

accessing nan_euclidean through scikit-learn directly
import skbio
from sklearn.metrics import pairwise_distances
sklearn_dm = pairwise_distances(counts, metric="nan_euclidean")
skbio_dm = skbio.DistanceMatrix(sklearn_dm)

Deprecated functionality [experimental]
* `skbio.alignment.local_pairwise_align_ssw` has been deprecated ([1814](https://github.com/scikit-bio/scikit-bio/issues/1814)) and will be removed or replaced in scikit-bio 0.6.0.

Bug fixes
* Use `oldest-supported-numpy` as build dependency. This fixes problems with environments that use an older version of numpy than the one used to build scikit-bio ([1813](https://github.com/scikit-bio/scikit-bio/pull/1813)).

Page 1 of 5

Releases

Has known vulnerabilities

Scikit-bio

Page 1 of 5

0.6.3

0.6.2

0.6.1

0.6.0

0.5.9

0.5.8

Page 1 of 5

Links

Releases