Modisco

Latest version: v0.5.16.4.1

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 9

0.5.7.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/61. Instead of deriving an embedding for coarse-grained similarity embedding using gapped k-mers, can derive the embedding from a neural network model (e.g. a by averaging the conv filter activations). Example notebook in https://github.com/kundajelab/tfmodisco/blob/36972870853e6631b2d32f1e489676a8241b385c/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA_With_Filter_Embeddings.ipynb.

0.5.6.5

Changes:
- Fixed .travis.yml such that the continuous integration works (including tests that involve invoking MEME)
- Cleaned up obsolete tests
- Added fix for case where reverse-complement tracks aren't present: https://github.com/kundajelab/tfmodisco/commit/fc8370ea582c3dd16527db767731e996d595c158
- Added python 2 fixes: https://github.com/kundajelab/tfmodisco/commit/c285e7f0834a773377e95cbc66f91a6504714530 and https://github.com/kundajelab/tfmodisco/commit/a267cbe5cc569870099502ff7ea52ce3a81dd05b

0.5.6.4

- Incorporates changes from PR https://github.com/kundajelab/tfmodisco/pull/60, which added the -revcomp flag to MEME if "revcomp=True" was specified in TfModiscoWorkflow (is true by default), and also switched `-mod` to `zoops` (`zoops` stands for "zero or one occurrences per sequence"; this concords with the default for the web and also seems more appropriate for seqlets than the `anr` mode, which stands for "any number of repetitions")
- Updated the Leiden clustering to take the best of both worlds over the singleton initialization (i.e. what is done without preclustering using MEME) and the MEME initialization.
- Updated dependency list in setup.py to be more complete
- Updated the test suite. Attempted to add a travis build but it looks like installing MEME via travis is nontrivial.

0.5.6.2

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/59. Updated setup.py to include leidenalg and tqdm.

0.5.6.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/57, notes duplicated below:

An initial clustering can be specified using the `initclusterer_factory` argument of `TfModiscoSeqletsToPatternsFactory`. See [this notebook](https://github.com/kundajelab/tfmodisco/blob/e1db274fbd0a42daf2983421304184f20175d66f/examples/H1ESC_Nanog_gkmsvm/TF%20MoDISco%20Nanog.ipynb) for an example. Here's an example for MEME-based initialization (which is what's supported at the time of writing):

initclusterer_factory=modisco.clusterinit.memeinit.MemeInitClustererFactory(
meme_command="meme", base_outdir="meme_out",
max_num_seqlets_to_use=10000,
nmotifs=10,
n_jobs=4)

Explanation of the arguments:
- `meme_command`: this is just `meme` if the meme executable is in the PATH; if it's not in the path, then `meme_command` should specify the full path to the executable, e.g. `/software/meme/5.0.1/bin/meme` on the kundajelab servers.
- `base_outdir`: output directory for writing the meme results (will be relative to the current working directory unless an absolute path is provided). Within this directory, subdirectories will be created for each metacluster.
- `max_num_seqlets_to_use`: to prevent MEME from taking too long, the number of seqlets to use for running MEME will be capped to this.
- `nmotifs`: the number of motifs for MEME to find. Only significant motifs (e value < 0.05) will be used for the clustering.
- `njobs`: specifies the value of the `-p` argument of MEME, and also specifies the number of parallel jobs to launch when doing motif scanning with the MEME PWMs.

The cluster initialization with MEME is achieved as follows: the PWMs produced by MEME are used to scan all the seqlets, and only PWM matches that exceed the Bayes optimal threshold specified by MEME are considered. Seqlets that contain no PWM matches are assigned to their own cluster. The remaining seqlets are each assigned to a cluster corresponding to the PWM for which they had the strongest match by log-odds score.

The cluster initialization affects the TF-MoDISco workflow in two places:
- First, the fine-grained similarity is computed not just on the set of nearest-neighbors that have the highest coarse-grained similarity across all seqlets, but also on the set of nearest-neighbors that have the highest coarse-grained-similarity *within* each initialized cluster.
- Second, it is used to initialize Leiden community detection.

Empirically, this seems to result in TF-MoDISco clusters that get the "best of both worlds" from MEME and TF-MoDISco.

Other changes:
- Moved from Louvain -> Leiden for the main community detection step. Note that I am no longer doing consensus clustering with Leiden because it didn't appear to work well (consistent with [this discussion](https://twitter.com/avshrikumar/status/1235242045947531264?s=20) on twitter); instead, I am just taking the best modularity over 50 runs of Leiden with different random seeds. To go back to using Louvain for the main community detection step, set the `use_louvain` argument to `True` in `TfModiscoSeqletsToPatternsFactory` - but note that the cluster initialization functionality isn't supported with Louvain.*
- Updated the Nanog notebook to showcase the MEME initialization functionality
- Updated the Nanog notebook to use better normalization (I'm now just doing mean normalization across ACGT at each position, which I think is more intuitive and has a similar effect as the normalization I described in the GkmExplain paper). Also updated the notebook to apply normalization to the importance scores of the dinuc-shuffled null (previously, the scores for the null distribution weren't normalized)
- Added [tests](https://github.com/kundajelab/tfmodisco/blob/e1db274fbd0a42daf2983421304184f20175d66f/test/test_tfmodisco_workflow.py) for the MEME-based initialization

*The reason I don't support cluster initialization with Louvain is that, when using Louvain, the number of clusters can only decrease from one iteration to the next (with Leiden, the number of clusters can go up because there's a cluster-splitting step - in other words, if initialization was used with Louvain, the number of discovered clusters would be capped at the number of clusters present during initialization, which is undesirable). By the way, Louvain is still used in the "spurious merging detection" step of the post-processing; the reason is that in this step I attempt to split each cluster into two subclusters, and when using Louvain this cap on the number of subclusters can be achieved by initializing Louvain to have only 2 clusters (since the number of clusters in Louvain only decreases with each iteration).

0.5.5.6

PR here: https://github.com/kundajelab/tfmodisco/pull/56, example usage here: https://github.com/kundajelab/tfmodisco/blob/682faf6ef2dc40bbf4f3b0fdd57a23000e8737a1/test/test_tfmodisco_workflow.py#L117. Helps to avoid the issue of seqlet score distribution plots getting overridden when multiple tf-modisco jobs are launched from the same directory.

Page 5 of 9

Releases

Has known vulnerabilities

Previous Next

Modisco

Page 5 of 9

0.5.7.0

0.5.6.5

0.5.6.4

0.5.6.2

0.5.6.0

0.5.5.6

Page 5 of 9

Links

Releases