Major Features and Improvements
This is a major release which adds substantial new functionality and introduces several improvements.
Persistent homology of directed flag complexes via ``pyflagser``
- The ``pyflagser`` package (`source <https://github.com/giotto-ai/pyflagser>`_, `docs <https://docs-pyflagser.giotto.ai/>`_) is now an official dependency of ``giotto-tda``.
- The ``FlagserPersistence`` transformer has been added to ``gtda.homology`` (`339 <https://github.com/giotto-ai/giotto-tda/pull/339>`_). It wraps ``pyflagser.flagser_weighted`` to allow for computations of persistence diagrams from directed or undirected weighted graphs. A `new notebook <https://giotto-ai.github.io/gtda-docs/0.3.0/notebooks/persistent_homology_graphs.html>`_ demonstrates its use.
Edge collapsing and performance improvements for persistent homology
- GUDHI C++ components have been updated to the state of GUDHI v3.3.0, yielding performance improvements in ``SparseRipsPersistence``, ``EuclideanCechPersistence`` and ``CubicalPersistence`` (`468 <https://github.com/giotto-ai/giotto-tda/pull/468>`_).
- Bindings for GUDHI's `edge collapser <https://hal.inria.fr/hal-02395227>`_ have been created and can now be used as an optional preprocessing step via the optional keyword argument ``collapse_edges`` in ``VietorisRipsPersistence`` and in ``gtda.externals.ripser`` (`#469 <https://github.com/giotto-ai/giotto-tda/pull/469>`_ and `#483 <https://github.com/giotto-ai/giotto-tda/pull/483>`_). When ``collapse_edges=True``, and the input data and/or number of required homology dimensions is sufficiently large, the resulting runtimes for Vietoris–Rips persistent homology are state of the art.
- The performance of the Ripser bindings has otherwise been improved by avoiding unnecessary data copies, better managing the memory, and using more efficient matrix routines (`501 <https://github.com/giotto-ai/giotto-tda/pull/501>`_ and `#507 <https://github.com/giotto-ai/giotto-tda/pull/507>`_).
New transformers and functionality in ``gtda.homology``
- The ``WeakAlphaPersistence`` transformer has been added to ``gtda.homology`` (`464 <https://github.com/giotto-ai/giotto-tda/pull/464>`_). Like ``VietorisRipsPersistence``, ``SparseRipsPersistence`` and ``EuclideanCechPersistence``, it computes persistent homology from point clouds, but its runtime can scale much better with size in low dimensions.
- ``VietorisRipsPersistence`` now accepts sparse input when ``metric="precomputed"`` (`424 <https://github.com/giotto-ai/giotto-tda/pull/424>`_).
- ``CubicalPersistence`` now accepts lists of 2D arrays (`503 <https://github.com/giotto-ai/giotto-tda/pull/503>`_).
- A ``reduced_homology`` parameter has been added to all persistent homology transformers. When ``True``, one infinite bar in the H0 barcode is removed for the user automatically. Previously, it was not possible to *keep* these bars in the simplicial homology transformers. The default is always ``True``, which implies a breaking change in the case of ``CubicalPersistence`` (`467 <https://github.com/giotto-ai/giotto-tda/pull/467>`_).
Persistence diagrams
- A ``ComplexPolynomial`` feature extraction transformer has been added (`479 <https://github.com/giotto-ai/giotto-tda/pull/479>`_).
- A ``NumberOfPoints`` feature extraction transformer has been added (`496 <https://github.com/giotto-ai/giotto-tda/pull/496>`_).
- An option to normalize the entropy in ``PersistenceEntropy`` according to a heuristic has been added, and a ``nan_fill_value`` parameter allows to replace any NaN produced by the entropy calculation with a fixed constant (`450 <https://github.com/giotto-ai/giotto-tda/pull/450>`_).
- The computations in ``HeatKernel``, ``PersistenceImage`` and in the pairwise distances and amplitudes related to them has been changed to yield the continuum limit when ``n_bins`` tends to infinity; ``sigma`` is now measured in the same units as the filtration parameter and defaults to 0.1 (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
New ``curves`` subpackage
A new ``curves`` subpackage has been added to preprocess, and extract features from, collections of multi-channel curves such as returned by ``BettiCurve``, ``PersistenceLandscape`` and ``Silhouette`` (`480 <https://github.com/giotto-ai/giotto-tda/pull/480>`_). It contains:
- A ``StandardFeatures`` transformer that can extract features channel-wise in a generic way.
- A ``Derivative`` transformer that computes channel-wise derivatives of any order by discrete differences (`492 <https://github.com/giotto-ai/giotto-tda/pull/492>`_).
New ``metaestimators`` subpackage
A new ``metaestimator`` subpackage has been added with a ``CollectionTransformer`` meta-estimator which converts any transformer instance into a fit-transformer acting on collections (`495 <https://github.com/giotto-ai/giotto-tda/pull/495>`_).
- A ``DensityFiltration`` for collections of binary images has been added (`473 <https://github.com/giotto-ai/giotto-tda/pull/473>`_).
- ``Padder`` and ``Inverter`` have been extended to greyscale images (`489 <https://github.com/giotto-ai/giotto-tda/pull/489>`_).
Time series
- ``TakensEmbedding`` is now a new transformer acting on collections of time series (`460 <https://github.com/giotto-ai/giotto-tda/pull/460>`_).
- The former ``TakensEmbedding`` acting on a single time series has been renamed to ``SingleTakensEmbedding`` transformer, and the internal logic employed in its ``fit`` for computing optimal hyperparameters is now available via a ``takens_embedding_optimal_parameters`` convenience function (`460 <https://github.com/giotto-ai/giotto-tda/pull/460>`_).
- The ``_slice_windows`` method of ``SlidingWindow`` has been made public and renamed into ``slice_windows`` (`460 <https://github.com/giotto-ai/giotto-tda/pull/460>`_).
- ``GraphGeodesicDistance`` has been improved as follows (`422 <https://github.com/giotto-ai/giotto-tda/pull/422>`_):
- The new parameters ``directed``, ``unweighted`` and ``method`` have been added.
- The rules on the role of zero entries, infinity entries, and non-stored values have been made clearer.
- Masked arrays are now supported.
- A ``mode`` parameter has been added to ``KNeighborsGraph``; as in ``scikit-learn``, it can be set to either ``"distance"`` or ``"connectivity"`` (`478 <https://github.com/giotto-ai/giotto-tda/pull/478>`_).
- List input is now accepted by all transformers in ``gtda.graphs``, and outputs are consistently either lists or 3D arrays (`478 <https://github.com/giotto-ai/giotto-tda/pull/478>`_).
- Sparse matrices returned by ``KNeighborsGraph`` and ``TransitionGraph`` now have int dtype (0-1 adjacency matrices), and are not necessarily symmetric (`478 <https://github.com/giotto-ai/giotto-tda/pull/478>`_).
- Pullback cover set labels and partial cluster labels have been added to Mapper node hovertexts (`445 <https://github.com/giotto-ai/giotto-tda/pull/445>`_).
- The functionality of ``Nerve`` and ``make_mapper_pipeline`` has been greatly extended (`447 <https://github.com/giotto-ai/giotto-tda/pull/447>`_ and `#456 <https://github.com/giotto-ai/giotto-tda/pull/456>`_):
- Node and edge metadata are now accessible in output ``igraph.Graph`` objects by means of the ``VertexSeq`` and ``EdgeSeq`` attributes ``vs`` and ``es`` (respectively). Graph-level dictionaries are no longer used.
- Available node metadata can be accessed by ``graph.vs[attr_name]`` where for ``attr_name`` is one of ``"pullback_set_label"``, ``"partial_cluster_label"``, or ``"node_elements"``.
- Sizes of intersections are automatically stored as edge weights, accessible by ``graph.es["weight"]``.
- A ``"store_intersections"`` keyword argument has been added to ``Nerve`` and ``make_mapper_pipeline`` to allow to store the indices defining node intersections as edge attributes, accessible via ``graph.es["edge_elements"]``.
- A ``contract_nodes`` optional parameter has been added to both ``Nerve`` and ``make_mapper_pipeline``; nodes which are subsets of other nodes are thrown away from the graph when this parameter is set to ``True``.
- A ``graph_`` attribute is stored during ``Nerve.fit``.
- Two of the ``Nerve`` parameters (``min_intersection`` and the new ``contract_nodes``) are now available in the widgets generated by ``plot_interactive_mapper_graph``, and the layout of these widgets has been improved (`456 <https://github.com/giotto-ai/giotto-tda/pull/456>`_).
- ``ParallelClustering`` and ``Nerve`` have been exposed in the documentation and in ``gtda.mapper``'s ``__init__`` (`447 <https://github.com/giotto-ai/giotto-tda/pull/447>`_).
- A ``plot_params`` kwarg is available in plotting functions and methods throughout to allow user customisability of output figures. The user must pass a dictionary with keys ``"layout"`` and/or ``"trace"`` (or ``"traces"`` in some cases) (`441 <https://github.com/giotto-ai/giotto-tda/pull/441>`_).
- Several plots produced by ``plot`` class methods now have default titles (`453 <https://github.com/giotto-ai/giotto-tda/pull/453>`_).
- Infinite deaths are now plotted by ``plot_diagrams`` (`461 <https://github.com/giotto-ai/giotto-tda/pull/461>`_).
- Possible multiplicities of persistence pairs in persistence diagram plots are now indicated in the hovertext (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- ``plot_heatmap`` now accepts boolean array input (`444 <https://github.com/giotto-ai/giotto-tda/pull/444>`_).
New tutorials and examples
The following new tutorials have been added:
- `Topology of time series <https://giotto-ai.github.io/gtda-docs/0.3.0/notebooks/time_series_classification.html>`_, which explains the theory of the Takens time-delay embedding and its use with persistent homology, demonstrates the new ``API`` of several components in ``gtda.time_series``, and shows how to construct time series *classification* pipelines in ``giotto-tda`` by partially reproducing `arXiv:1910:08245 <https://arxiv.org/abs/1910.08245>`_.
- `Topology in time series forecasting <https://giotto-ai.github.io/gtda-docs/0.3.0/notebooks/time_series_forecasting.html>`_, which explains how to set up time series *forecasting* pipelines in ``giotto-tda`` via ``TransformerResamplerMixin``s and the ``giotto-tda`` ``Pipeline`` class.
- `Topological feature extraction from graphs <https://giotto-ai.github.io/gtda-docs/0.3.0/notebooks/persistent_homology_graphs.html>`_, which explains what the features extracted from directed or undirected graphs by ``VietorisRipsPersistence``, ``SparseRipsPersistence`` and ``FlagserPersistence`` are.
- `Classifying handwritten digits <https://giotto-ai.github.io/gtda-docs/0.3.0/notebooks/MNIST_classification.html>`_, which presents a fully-fledged machine learning pipeline in which cubical persistent homology is applied to the classification of handwritten images from he MNIST dataset, partially reproducing `arXiv:1910.08345 <https://arxiv.org/abs/1910.08345>`_.
- A ``check_collection`` input validation function has been added (`491 <https://github.com/giotto-ai/giotto-tda/pull/491>`_).
- ``validate_params`` now accepts ``"in"`` and ``"of"`` keys simultaneously in the ``references`` dictionaries, with ``"in"`` used for non-list-like types and ``"of"`` otherwise (`502 <https://github.com/giotto-ai/giotto-tda/pull/502>`_).
Installation improvements
- ``pybind11`` is now treated as a standard git submodule in the developer installation (`459 <https://github.com/giotto-ai/giotto-tda/pull/459>`_).
- ``pandas`` is now part of the testing requirements when intalling from source (`508 <https://github.com/giotto-ai/giotto-tda/pull/508>`_).
Bug Fixes
- A bug has been fixed which could lead to features with negative lifetime in persistent homology transformers when ``infinity_values`` was set too low (`339 <https://github.com/giotto-ai/giotto-tda/pull/339>`_).
- By relying on ``scipy``'s ``shortest_path`` instead of ``scikit-learn``'s ``graph_shortest_path``, some errors in computing ``GraphGeodesicDistance`` (e.g. when som edges are zero) have been fixed (`422 <https://github.com/giotto-ai/giotto-tda/pull/422>`_).
- A bug in the handling of COO matrices by the ``ripser`` interface has been fixed (`465 <https://github.com/giotto-ai/giotto-tda/pull/465>`_).
- A bug which led to the incorrect handling of the ``homology_dimensions`` parameter in ``Filtering`` has been fixed (`439 <https://github.com/giotto-ai/giotto-tda/pull/439>`_).
- An issue with the use of ``joblib.Parallel``, which led to errors when attempting to run ``HeatKernel``, ``PersistenceImage``, and the corresponding amplitudes and distances on large datasets, has been fixed (`428 <https://github.com/giotto-ai/giotto-tda/pull/428>`_ and `#481 <https://github.com/giotto-ai/giotto-tda/pull/481>`_).
- A bug leading to plots of persistence diagrams not showing points with negative births or deaths has been fixed, as has a bug with the computation of the range to be shown in the plot (`437 <https://github.com/giotto-ai/giotto-tda/pull/437>`_).
- A bug in the handling of persistence pairs with negative death values by ``Filtering`` has been fixed (`436 <https://github.com/giotto-ai/giotto-tda/pull/436>`_).
- A bug in the handling of ``homology_dimension_ix`` (now renamed to ``homology_dimension_idx``) in the ``plot`` methods of ``HeatKernel`` and ``PersistenceImage`` has been fixed (`452 <https://github.com/giotto-ai/giotto-tda/pull/452>`_).
- A bug in the labelling of axes in ``HeatKernel`` and ``PersistenceImage`` plots has ben fixed (`453 <https://github.com/giotto-ai/giotto-tda/pull/453>`_ and `#454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- ``PersistenceLandscape`` plots now show all homology dimensions, instead of just the first (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- A bug in the computation of amplitudes and pairwise distances based on persistence images has been fixed (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- ``Silhouette`` now does not create NaNs when a subdiagram is trivial (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- ``CubicalPersistence`` now does not create pairs with negative persistence when ``infinity_values`` is set too low (`467 <https://github.com/giotto-ai/giotto-tda/pull/467>`_).
- Warnings are no longer thrown by ``KNeighborsGraph`` when ``metric="precomputed"`` (`506 <https://github.com/giotto-ai/giotto-tda/pull/506>`_).
- A bug in ``Labeller.resample`` affecting cases in which ``n_steps_future >= size - 1``, has been fixed (`460 <https://github.com/giotto-ai/giotto-tda/pull/460>`_).
- A bug in ``validate_params``, affecting the case of tuples of allowed types, has been fixed (`502 <https://github.com/giotto-ai/giotto-tda/pull/502>`_).
Backwards-Incompatible Changes
- The minimum required versions from most of the dependencies have been bumped. The updated dependencies are ``numpy >= 1.19.1``, ``scipy >= 1.5.0``, ``joblib >= 0.16.0``, ``scikit-learn >= 0.23.1``, ``python-igraph >= 0.8.2``, ``plotly >= 4.8.2``, and ``pyflagser >= 0.4.1`` (`457 <https://github.com/giotto-ai/giotto-tda/pull/457>`_).
- ``GraphGeodesicDistance`` now returns either lists or 3D dense ndarrays for compatibility with the homology transformers - By relying on ``scipy``'s ``shortest_path`` instead of ``scikit-learn``'s ``graph_shortest_path``, some errors in computing ``GraphGeodesicDistance`` (e.g. when som edges are zero) have been fixed (`422 <https://github.com/giotto-ai/giotto-tda/pull/422>`_).
- The output of ``PairwiseDistance`` has been transposed to match ``scikit-learn`` convention ``(n_samples_transform, n_samples_fit)`` (`420 <https://github.com/giotto-ai/giotto-tda/pull/420>`_).
- ``plot`` class methods now return figures instead of showing them (`441 <https://github.com/giotto-ai/giotto-tda/pull/441>`_).
- Mapper node and edge attributes are no longer stored as graph-level dictionaries, ``"node_id"`` is no longer an available node attribute, and the attributes ``nodes_`` and ``edges_`` previously stored by ``Nerve.fit`` have been removed in favour of a ``graph_`` attribute (`447 <https://github.com/giotto-ai/giotto-tda/pull/447>`_).
- The ``homology_dimension_ix`` parameter available in some transformers in ``gtda.diagrams`` has been renamed to ``homology_dimensions_idx`` (`452 <https://github.com/giotto-ai/giotto-tda/pull/452>`_).
- The base of the logarithm used by ``PersistenceEntropy`` is now 2 instead of *e*, and NaN values are replaced with -1 instead of 0 by default (`450 <https://github.com/giotto-ai/giotto-tda/pull/450>`_ and `#474 <https://github.com/giotto-ai/giotto-tda/pull/474>`_).
- The outputs of ``PersistenceImage``, ``HeatKernel`` and of the pairwise distances and amplitudes based on them is now different due to the improvements described above.
- Weights are no longer stored in the ``effective_metric_params_`` attribute of ``PairwiseDistance``, ``Amplitude`` and ``Scaler`` objects when the metric is persistence-image–based; only the weight function is (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- The ``homology_dimensions_`` attributes of several transformers have been converted from lists to tuples. When possible, homology dimensions stored as parts of attributes are now presented as ints (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- ``gaussian_filter`` (used to make heat– and persistence-image–based representations/pairwise distances/amplitudes) is now called with ``mode="constant"`` instead of ``"reflect"`` (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- The default value of ``order`` in ``Amplitude`` has been changed from ``2.`` to ``None``, giving vector instead of scalar features (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- The meaning of the default ``None`` for ``weight_function`` in ``PersistenceImage`` (and in ``Amplitude`` and ``PairwiseDistance`` when ``metric="persistence_image"``) has been changed from the identity function to the function returning a vector of ones (`454 <https://github.com/giotto-ai/giotto-tda/pull/454>`_).
- Due to the updates in the GUDHI components, some of the bindings and Python interfaces to the GUDHI C++ components in ``gtda.externals`` have changed (`468 <https://github.com/giotto-ai/giotto-tda/pull/468>`_).
- ``Labeller.transform`` now returns a 1D array instead of a column array (`475 <https://github.com/giotto-ai/giotto-tda/pull/475>`_).
- ``PersistenceLandscape`` now returns 3D arrays instead of 4D ones, for compatibility with the new ``curves`` subpackage (`480 <https://github.com/giotto-ai/giotto-tda/pull/480>`_).
- By default, ``CubicalPersistence`` now removes one infinite bar in H0 (`467 <https://github.com/giotto-ai/giotto-tda/pull/467>`_, and see above).
- The former ``width`` parameter in ``SlidingWindow`` and ``Labeller`` has been replaced with a more intuitive ``size`` parameter. The relation between the two is: ``size = width + 1`` (`460 <https://github.com/giotto-ai/giotto-tda/pull/460>`_).
- ``clusterer`` is now a required parameter in ``ParallelClustering`` (`508 <https://github.com/giotto-ai/giotto-tda/pull/508>`_).
- The ``max_fraction`` parameter in ``FirstSimpleGap`` and ``FirstHistogramGap`` now indicates the floor of ``max_fraction * n_samples``; its default value has been changed from ``None`` to ``1`` (`412 <https://github.com/giotto-ai/giotto-tda/pull/412>`_).
Thanks to our Contributors
This release contains contributions from many people:
Umberto Lupo, Guillaume Tauzin, Julian Burella Pérez, Wojciech Reise, Lewis Tunstall, Nick Sale, and Anibal Medina-Mardones.
We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.