Modisco

Latest version: v0.5.16.4.1

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 9

0.5.13.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/85.

Leiden is run with multiple different random seeds (and the best partition is used) for robustness. Prior to this PR, those runs were not parallelized because trying to parallelize `leidenalg.find_partition` naively via joblib results in a `TypeError: cannot pickle ‘PyCapsule’ object` error. In this PR, parallelism is achieved by making calls to a dedicated script that runs leiden community detection (one that is called using `subprocess.Popen`).

Results on bpnet nanog task are here (gives the same results as before, but spends noticeably less time on the Leiden clustering steps): http://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/b3b4d7b240b8e398597100581ae791eec0a13b61/bpnet/trial1/TryBpNet_v0.5.13.0.ipynb
(Contrast with https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/2ba855b85eddc4c4d7b5e3296c6e12cce04a705d/bpnet/trial1/TryBpNet_v0.5.11.0_reducemem.ipynb)

0.5.12.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/84

- Performs density-adapted clustering (Leiden) + computes a tsne embedding. By default, perplexity of 30 is used for both. If TF-MoDISco is run with this version from the beginning, the subclustering will automatically be computed for each motif. Otherwise, motifs computed with a previous version can be loaded and the subclutering computed post-hoc.
- Uses all pairwise continuous jaccard similarities for computing the similarity matrix, but is memory efficient because the number of nearest neighbors used to perform the density adaptation is far smaller (it's perplexity*3 + 1) than the total number of nodes in the graph (only the similarities for the necessary number of nearest neighbors are kept in memory)
- Added support for saving and loading the subclustering from file

Notebook demonstrating the change on the example tal-gata notebook: https://github.com/kundajelab/tfmodisco/blob/c2c6001b8a2608ee5224ac7faeb69d5fd72f78f5/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb

Notebook demonstrating how to compute the subclustering post-hoc:
http://mitra.stanford.edu/kundaje/avanti/tfmodisco_bio_experiments/bpnet/trial1/TryBpNet_v0.5.12.0_add_in_subclustering.html
(github permalink: https://github.com/kundajelab/tfmodisco_bio_experiments/blob/54df6faa20773d91107e7b645649e2145e3fb0de/bpnet/trial1/TryBpNet_v0.5.12.0_add_in_subclustering.ipynb)

0.5.11.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/83. Results should remain identical compared to v0.5.10.2.

0.5.10.2

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/82
- Default settings for variable-length seqlet identification seem not-great, so reverting to the previous default of fixed-length seqlet identification (which has been battle-tested more)
- Putting in the fix caught by 81
- Another minor fix for cases where there are lots of ties in the scores during seqlet identification (which could occur with the variable-length seqlet identification)

0.5.10.1

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/80

This error typically occurs when the user has sequences that are not perfectly one-hot encoded - i.e. some columns are all-zeros (usually because the one-hot encoding procedure mapped Ns to all-zeros). This can result in a PPM where the probabilities don't sum to 1 in all the rows. The error is thrown when computing the information content for visualization purposes. The information content can still be calculated by simply renormalizing the rows to sum to 1, so that's what this workaround does (after printing a warning). The user should still make sure that they are ok with this behavior.

0.5.10.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/78

- AGKM embeddings are now the default (also avoids importing tensorflow unless user wants to use the previous gapped kmer embeddings)
- cap on the agkm embeddings size; reduces memory use
- Implementation of variable-length seqlet identification
- Initial implementation of exemplar-based hit-scoring

Page 3 of 9

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.