NS_Forest(adata, clusterLabelcolumnHeader = "louvain", rfTrees = 1000, Median_Expression_Level = 0, Genes_to_testing = 6, betaValue = 0.5)
* adata = scanpy object
* rfTrees = Number of trees
* clusterLabelcolumnHeader = column header in adata.obs['header_here!'] where cluster assignments reside. Typically 'louvain' if louvain clustering was used.
* Median_Expression_Level = median expression level for removing negative markers
* Genes_to_testing = How many ranked genes by binary score will be evaluated in permutations by fbeta-score
* betaValue = Set values for fbeta weighting. 1 is default f-measure. close to zero is Precision, greater than 1 weights toward Recall
Description
Necessary and Sufficient Forest is a method that takes cluster results from single cell/nuclei RNAseq experiments
and generates lists of minimal markers needed to define each “cell type cluster”.
The method begins by re-encoding the cluster labels into binary classifications, and Random Forest models are generated comparing each
cluster versus all. The top fifteen genes are then reranked using a score measuring how binary they are, e.g., a gene with expression in
the target cluster but no expression in the other clusters would have a high binary score. Expression cutoffs for the top six genes ranked
by binary score are then determined by generating individual decision trees and extracting the decision path information. Then all combinations
of the top six most binary genes are evaluated using f-beta score as an objective function (the beta value default set at 0.5, which weights the
f-measure score more toward precision as opposed to recall).
See code for detailed comments.
Versioning
This is version 3.0 The earlier releases were described in the below publications.
Version 2
Aevermann BD, Zhang Y, Novotny M, Keshk M, Bakken TE, Miller JA, Hodge RD, Lelieveldt B, Lein ES, Scheuermann RH. A machine learning method for the discovery of minimum marker gene combinations for cell-type identification from single-cell RNA sequencing. Genome Res. 2021 Jun 4:gr.275569.121. doi: 10.1101/gr.275569.121. Epub ahead of print. PMID: 34088715.