Doubletdetection

Latest version: v4.2

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

2.1

- Adds the option to not use downsampling when creating synthetic doublets by setting `new_lib_as=None`. This is equivalent to `new_lib_as=np.sum`.
- **This is now the default**, as we have found the performance universally superior.

- Adds plotting utility methods:
- Convergence plot: number of doublets called by iteration
- tSNE plot of original cells clustered with doublets detected marked in black.
- See updated notebook for examples of both.

- Additional data loading utility methods for 10x's mtx and h5 file formats.
- `load_10x_h5()` and `load_mtx()`

- Now requires matplotlib, tables, and multicore tsne. Be sure to upgrade per the instructions in the README.

2.0

Finalizes all public facing parameters, methods, and attributes, including many removals.
Standard usage is now to call `.fit()` on the classifier followed by `.predict()`, which returns the labels for the input to `.fit`.

See the in-line docstrings and the example notebook in the docs folder for more usage details.
See the merge commit message for a list of salient changes.

1.4

Adds full installation `setup.py`. Uses `pip3` and will automatically retrieve required dependancies when needed.

Adds an example usage vignette in the docs folder.

Corrects a omitted default parameter update from v1.3 (`knn=30` now).

1.3.1

1.3

* Uses a voting scheme on the results of multiple runs for the final classification: cells with p-value > `p_thresh` (default 0.99) on at least `voter_thresh` (default 0.9) of iterations are called doublets. Both thresholds are optional parameters to `.fit`.

* Updates `n_iters`(25), `n_top_var_genes`(10000), and phenograph_parameters(`'prune'=True`) defaults to be in line with updated methodology.

* Scores & p_values for cells assigned to the -1 "cluster" (i.e. not in a cluster) by Phenograph always set to np.nan.
When classifying on multiple iterations, any np.nan iterations are ignored. If a cell is assigned -1 (and so np.nan) on all iterations, its final label is also set to np.nan.

* We strongly recommended to set `prune=False` if running few iterations. With only one iteration, many labels will be np.NaN, in effect too ambiguous to make any call.

1.2

* Added hypergeometric `p_values_` attribute, providing a confidence value that each cell is a doublet.

* Allow multiple runs to reduce p-value variance.
Reduced p-value variance improves accuracy of classification labels.

Adds `n_iters` parameter to control number of runs. Defaults to `n_iters=5`.
When `n_iters` > 1, many public class attributes change. For the most part, attributes that are changed either do not exist or gain an additional dimension to house the results from each run. See the updated class docstrings for details.
Note that `.labels_` does not change.

When `n_iters` > 1, `.labels_` is set by `.p_values_`, otherwise set by `.suggested_cutoff_` as in previous releases.

* Some parameter defaults were changed to reflect our current best performance estimate.

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.