Enhancements
Online updates of {class}`~scvi.model.SCVI`, {class}`~scvi.model.SCANVI`, and {class}`~scvi.model.TOTALVI` with the scArches method <!-- markdownlint-disable -->
It is now possible to iteratively update these models with new samples, without altering the model
for the "reference" population. Here we use the
[scArches method](https://github.com/theislab/scarches). For usage, please see the tutorial in the
user guide.
To enable scArches in our models, we added a few new options. The first is `encode_covariates`,
which is an `SCVI` option to encode the one-hotted batch covariate. We also allow users to exchange
batch norm in the encoder and decoder with layer norm, which can be though of as batch norm but per
cell. As the layer norm we use has no parameters, it's a bit faster than models with batch norm. We
don't find many differences between using batch norm or layer norm in our models, though we have
kept defaults the same in this case. To run scArches effectively, batch norm should be exhanged
with layer norm.
Empirical initialization of protein background parameters with totalVI
The learned prior parameters for the protein background were randomly initialized. Now, they can be
set with the `empirical_protein_background_prior` option in {class}`~scvi.model.TOTALVI`. This
option fits a two-component Gaussian mixture model per cell, separating those proteins that are
background for the cell and those that are foreground, and aggregates the learned mean and variance
of the smaller component across cells. This computation is done per batch, if the `batch_key` was
registered. We emphasize this is just for the initialization of a learned parameter in the model.
Use observed library size option
Many of our models like `SCVI`, `SCANVI`, and {class}`~scvi.model.TOTALVI` learn a latent library
size variable. The option `use_observed_lib_size` may now be passed on model initialization. We
have set this as `True` by default, as we see no regression in performance, and training is a bit
faster.
Important changes
- To facilitate these enhancements, saved {class}`~scvi.model.TOTALVI` models from previous
versions will not load properly. This is due to an architecture change of the totalVI encoder,
related to latent library size handling.
- The default latent distribtuion for {class}`~scvi.model.TOTALVI` is now `"normal"`.
- Autotune was removed from this release. We could not maintain the code given the new API changes
and we will soon have alternative ways to tune hyperparameters.
- Protein names during `setup_anndata` are now stored in `adata.uns["_scvi"]["protein_names"]`,
instead of `adata.uns["scvi_protein_names"]`.
Bug fixes
- Fixed an issue where the unlabeled category affected the SCANVI architecture prior distribution.
Unfortunately, by fixing this bug, loading previously trained (\<v0.8.0)
{class}`~scvi.model.SCANVI` models will fail.