In this release, we have completely refactored the logic behind our data handling strategy (i.e.
`setup_anndata`) to allow for:
1. Readable data handling for existing models.
1. Modular code for easy addition of custom data fields to incorporate into models.
1. Avoidance of unexpected edge cases when more than one model is instantiated in one session.
**Important Note:** This change will not break pipelines for model users (with the exception of a
small change to {class}`~scvi.model.SCANVI`). However, there are several breaking changes for model
developers. The data handling tutorial goes over these changes in detail.
This refactor is centered around the new {class}`~scvi.data.AnnDataManager` class which
orchestrates any data processing necessary for scvi-tools and stores necessary information, rather
than adding additional fields to the AnnData input.
:::{figure} docs/\_static/img/anndata_manager_schematic.svg
:align: center
:alt: Schematic of data handling strategy with AnnDataManager
:class: img-fluid
Schematic of data handling strategy with {class}`~scvi.data.AnnDataManager`
:::
We also have an exciting new experimental Jax-based scVI implementation via
{class}`~scvi.model.JaxSCVI`. While this implementation has limited functionality, we have found it
to be substantially faster than the PyTorch-based implementation. For example, on a 10-core Intel
CPU, Jax on only a CPU can be as fast as PyTorch with a GPU (RTX3090). We will be planning further
Jax integrations in the next releases.
Changes
- Major refactor to data handling strategy with the introduction of
{class}`~scvi.data.AnnDataManager` ([1237]).
- Prevent clobbering between models using the same AnnData object with model instance specific
{class}`~scvi.data.AnnDataManager` mappings ([1342]).
- Add `size_factor_key` to {class}`~scvi.model.SCVI`, {class}`~scvi.model.MULTIVI`,
{class}`~scvi.model.SCANVI`, and {class}`~scvi.model.TOTALVI` ([1334]).
- Add references to the scvi-tools journal publication to the README ([1338], [1339]).
- Addition of {func}`scvi.model.utils.mde` ([1372]) for accelerated visualization of scvi-tools
embeddings.
- Documentation and user guide fixes ([1364], [1361])
- Fix for {class}`~scvi.external.SOLO` when {class}`~scvi.model.SCVI` was setup with a `labels_key`
([1354])
- Updates to tutorials ([1369], [1371])
- Furo docs theme ([1290])
- Add {class}`scvi.model.JaxSCVI` and {class}`scvi.module.JaxVAE`, drop Numba dependency for
checking if data is count data ([1367]).
Breaking changes
- The keyword argument `run_setup_anndata` has been removed from built-in datasets since there is
no longer a model-agnostic `setup_anndata` method ([1237]).
- The function `scvi.model._metrics.clustering_scores` has been removed due to incompatbility with
new data handling ([1237]).
- {class}`~scvi.model.SCANVI` now takes `unlabeled_category` as an argument to
{meth}`~scvi.model.SCANVI.setup_anndata` rather than on initialization ([1237]).
- `setup_anndata` is now a class method on model classes and requires specific function calls to
ensure proper {class}`~scvi.data.AnnDataManager` setup and model save/load. Any model
inheriting from {class}`~scvi.model.base.BaseModelClass` will need to re-implement this method
([1237]).
- To adapt existing custom models to v0.15.0, one can references the guidelines below. For
some examples of how this was done for the existing models in the codebase, please
reference the following PRs: ([1301], [1302]).
- `scvi._CONSTANTS` has been changed to `scvi.REGISTRY_KEYS`.
- `setup_anndata()` functions are now class functions and follow a specific structure. Please
refer to {meth}`~scvi.model.SCVI.setup_anndata` for an example.
- `scvi.data.get_from_registry()` has been removed. This method can be replaced by
{meth}`scvi.data.AnnDataManager.get_from_registry`.
- The setup dict stored directly on the AnnData object, `adata["_scvi"]`, has been deprecated.
Instead, this information now lives in {attr}`scvi.data.AnnDataManager.registry`.
- The data registry can be accessed at {attr}`scvi.data.AnnDataManager.data_registry`.
- Summary stats can be accessed at {attr}`scvi.data.AnnDataManager.summary_stats`.
- Any field-specific information (e.g. `adata.obs["categorical_mappings"]`) now lives in
field-specific state registries. These can be retrieved via the function
{meth}`~scvi.data.AnnDataManager.get_state_registry`.
- `register_tensor_from_anndata()` has been removed. To register tensors with no relevant
`AnnDataField` subclass, create a new a new subclass of
{class}`~scvi.data.fields.BaseAnnDataField` and add it to appropriate model's
`setup_anndata()` function.
Contributors
- [jjhong922]
- [adamgayoso]
- [watiss]