Nsforest

Latest version: v3.9.2.5

Safety actively analyzes 622513 Python packages for vulnerabilities to keep your Python projects secure.

3.0

NS_Forest(adata, clusterLabelcolumnHeader = "louvain", rfTrees = 1000, Median_Expression_Level = 0, Genes_to_testing = 6, betaValue = 0.5)
* adata = scanpy object
* rfTrees = Number of trees
* clusterLabelcolumnHeader = column header in adata.obs['header_here!'] where cluster assignments reside. Typically 'louvain' if louvain clustering was used.
* Median_Expression_Level = median expression level for removing negative markers
* Genes_to_testing = How many ranked genes by binary score will be evaluated in permutations by fbeta-score
* betaValue = Set values for fbeta weighting. 1 is default f-measure. close to zero is Precision, greater than 1 weights toward Recall

Description

Necessary and Sufficient Forest is a method that takes cluster results from single cell/nuclei RNAseq experiments
and generates lists of minimal markers needed to define each “cell type cluster”.

The method begins by re-encoding the cluster labels into binary classifications, and Random Forest models are generated comparing each
cluster versus all. The top fifteen genes are then reranked using a score measuring how binary they are, e.g., a gene with expression in
the target cluster but no expression in the other clusters would have a high binary score. Expression cutoffs for the top six genes ranked
by binary score are then determined by generating individual decision trees and extracting the decision path information. Then all combinations
of the top six most binary genes are evaluated using f-beta score as an objective function (the beta value default set at 0.5, which weights the
f-measure score more toward precision as opposed to recall).

See code for detailed comments.

Versioning

This is version 3.0 The earlier releases were described in the below publications.

Version 2

Aevermann BD, Zhang Y, Novotny M, Keshk M, Bakken TE, Miller JA, Hodge RD, Lelieveldt B, Lein ES, Scheuermann RH. A machine learning method for the discovery of minimum marker gene combinations for cell-type identification from single-cell RNA sequencing. Genome Res. 2021 Jun 4:gr.275569.121. doi: 10.1101/gr.275569.121. Epub ahead of print. PMID: 34088715.

2.0

Necessary and Sufficient Forest (NS-Forest) for Cell Type Marker Determination from cell type clusters

Getting Started

Install Jupyter notebook and python 2.7

Prerequisites

* The script is a Jupyter notebook in python 2.7. Required libraries: Numpy, Pandas, Sklearn, graphviz, numexpr
* The input data is a tab delimited expression Cell x Gene matrix with one column containing the cluster assignments
* The cluster-label column must be named "Clusters" and the labels must be non-numeric (if currently numbers, please add "Cl" or any text would work).
* The gene identifiers used must avoid special characters such as ./-/ or beginning with numbers (I prefix identifiers beginning with numbers and substitute all special characters with "_")

Description

Necessary and Sufficient Forest is a method that takes cluster results from single cell/nuclei RNAseq experiments
and generates lists of minimal markers needed to define each “cell type cluster”.

The method begins by re-encoding the cluster labels into binary classifications, and Random Forest models are generated comparing each
cluster versus all. The top fifteen genes are then reranked using a score measuring how binary they are, e.g., a gene with expression in
the target cluster but no expression in the other clusters would have a high binary score. Expression cutoffs for the top six genes ranked
by binary score are then determined by generating individual decision trees and extracting the decision path information. Then all permutations
of the top six most binary genes are evaluated using f-beta score as an objective function (the beta value default set at 0.5, which weights the
f-measure score more toward precision as opposed to recall).

See code for detailed comments.

Versioning

This is version 2.0 The initial release was version 1.3. Version 1.0 was described in:

Aevermann BD, Novotny M, Bakken T, Miller JA, Diehl AD, Osumi-Sutherland D, Lasken RS, Lein ES, Scheuermann RH.
Cell type discovery using single-cell transcriptomics: implications for ontological representation.
Hum Mol Genet. 2018 May 1;27(R1):R40-R47. doi: 10.1093/hmg/ddy100.

Authors

* Brian Aevermann baevermajcvi.org and Richard Scheuermann RScheuermannjcvi.org

License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details

Acknowledgments

* Allen Institute of Brain Science
* Chan Zuckerberg Initiative
* California Institute for Regenerative Medicine

1.3

Necessary and Sufficient Forest (NS-Forest) for Cell Type Marker Determination from cell type clusters

Getting Started

Install Jupyter notebook and python 2.7

Prerequisites

* The script is a Jupyter notebook in python 2.7. Required libraries: Numpy, Pandas, Sklearn, graphviz, numexpr
* The input data is a tab delimited expression Cell x Gene matrix with one column containing the cluster assignments
* The cluster-label column must be named "Clusters" and the labels must be non-numeric (if currently numbers, please add "Cl" or any text would work).
* The gene identifiers used must avoid special characters such as ./-/ or beginning with numbers (I prefix identifiers beginning with numbers and substitute all special characters with "_")

Description

Necessary and Sufficient Forest is a method that takes cluster results from single cell/nuclei RNAseq experiments
and generates lists of minimal markers needed to define each “cell type cluster”.

The method begins by re-encoding the cluster labels into binary classifications, and Random Forest models are generated comparing each
cluster versus all. The top ten ranked features from the Random Forest are then tested using f-measure as an objective function.
For example, during the first step all top ten features are independently evaluated for their discriminatory power at an
expression value where 75% of the cells have greater than or equal expression. Given that 25% of the cells are lost de facto,
the maximum f-measure for the first step is estimated to be around 0.87 (there will be cases where its higher or lower, such
as having equal expression across all cells). After the best f-measure is found classifying with one gene than the remaining
nine genes are tested in combination with the top first gene, again using an expression value where 75% of the cells have expression.
After the best pair of genes is found, the remaining 8 genes are tested in third position, and onward until the analysis reaches
6 combinations.

See code for detailed comments.

Versioning

The initial release is version 1.3. Version 1.0 was described in:

Aevermann BD, Novotny M, Bakken T, Miller JA, Diehl AD, Osumi-Sutherland D, Lasken RS, Lein ES, Scheuermann RH.
Cell type discovery using single-cell transcriptomics: implications for ontological representation.
Hum Mol Genet. 2018 May 1;27(R1):R40-R47. doi: 10.1093/hmg/ddy100.

Authors

* Brian Aevermann baevermajcvi.org and Richard Scheuermann RScheuermannjcvi.org

License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details

Acknowledgments

* Allen Institute of Brain Science
* Chan Zuckerberg Initiative
* California Institute for Regenerative Medicine

Releases

Has known vulnerabilities

Nsforest

Page 1 of 1

3.0

2.0

1.3

Page 1 of 1

Links

Releases