Modisco

Latest version: v0.5.16.4.1

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 9

0.5.15.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/94

Emphasis is now given to the core seqlet region when figuring out which motif the seqlet aligns to, such that the presence of alternative motifs in the flanks can't change the motif assignment. Also revamped how the fine-grained affinities are calculated (there is no core-grained calculation step; I just first align the core seqlet to the aggregate pattern, and then use that alignment to compute the fine-grained similarities to the constituent seqlets in the pattern; it seems substantially faster)

Also improved the seqlet identification method:
- I switched to FixedWindow seqlet identification method, which is the same one used during the main modisco run, because the VariableWindow method was resulting in a lot of windows that were "tied" at an FDR of 0, even though some windows were much more high-scoring than others.
- I made a small tweak to improve the overlap exclusion (need to exclude core_window_size-0.5 on either side...the reason is just a really involved detail to do with indexing math)
- Put in a feature to allow for only returning postive-scoring seqlets, is on by default.

Also, with the Leiden bugfix, I'm back to using movenodes in the refine partition

Also, the hits now return trim_start and trim_end, which are the start/end of the trimmed pattern (the user can specify the IC; default 0.3)

0.5.14.2

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/93; version [0.5.14.0](https://github.com/kundajelab/tfmodisco/releases/tag/v0.5.14.0) accidentally removed the final motif flank expansion that was controlled by the parameter "final_flank_to_add", such that the flank expansion was effectively 0 (note: this only affected the flank expansion that was done at the very *end* of the tfmodisco pipeline; there is still flank expansion controlled by the parameter "initial_flank_to_add"). I added the functionality for final flank expansion back in, and for backwards compatibility with version 0.5.14.0 I have set the default value of final_flank_to_add to be 0 (previously, it was 10; a default of 0 is actually probably better from a user perspective, because sometimes users run tf-modisco on very short sequences, and having a large final_flank_to_add can cause many seqlets to get discarded when the expansion extends beyond the end of the sequence). I also cleaned up some of the notebooks.

0.5.14.1

described in more detail in https://github.com/kundajelab/tfmodisco/pull/91

The API is demonstrated in the TAL-GATA notebook here: https://github.com/kundajelab/tfmodisco/blob/e9b6db741ddfe73d0d7b86db50fb213e0d399612/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb
And also on Alex's E2F6 data (profile task) here: https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/master/alex_keytfs/tfmodisco_ac25874/E2F6_profile_v0.5.14.1_hitscoring.ipynb

This version also includes some Leiden-related fixes pertaining to vtraag/leidenalg68 and vtraag/leidenalg66

0.5.14.0

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/90

v0.5.14.0-devfixed
Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/88. The previous edition of this tag did not specify the correct target (i.e. the dev branch) so I'm re-doing it.

Three main changes:
- "DynamicDistanceSimilarPatternsCollapser" had a component that involved comparing within-cluster similarities to between-cluster similarities, and looking at a metric derived from the auROC for differentiating within-cluster motifs from between-cluster motifs. Previously, for a pair of motif, the average of this auROC-derived similarity was taken. However, that can cause motifs with low within-cluster similarity to "swallow" motifs with high within-cluster similarity (evidence in this notebook: https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/1f633b03c22860e5822e9ca5b013e68d0286332c/bpnet/trial1/TryBpNet_v0.5.14.0_studymergecriteria.ipynb - look at the pairwise similarities in the "Cross-contamination matrix" and notice the similarities of the first motif to everything else are very high even though the reciprocal similarities are very low). Thus, I decided to take the min rather than the average.
- Previously, when seqlets were merged together in a greedy fashion, the on-the-fly aggregated motif only averaged the importance scores over the seqlets that aligned to a particular position, and then only at the end would "expand" all the seqlets to fill out the full motif. The caveat here is that for motifs like Nanog, which can have two instances in the same seqlet (due to periodic binding), it can create two "modes" in the on-the-fly aggregated seqlet, and each mode looks very strong as its being computed on-the-fly (because the aggregation is only done over the seqlets that align to the position), but when the flank expansion is conducted at the very end, both modes end up with degraded information content (because the seqlets that were aligning to one mode don't always turn out to have a motif instance at the other mode after expansion). The solution is to peform the flank expansion on-the-fly. Compare the IC of the motifs, esp the nanog motif, in https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/1f633b03c22860e5822e9ca5b013e68d0286332c/bpnet/trial1/TryBpNet_v0.5.14.0_ontheflyflankfill.ipynb, which has on-the-fly flank filling, vs. https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/1f633b03c22860e5822e9ca5b013e68d0286332c/bpnet/trial1/TryBpNet_v0.5.14.0_noontheflyflankfill.ipynb, which didn't have the flank filling)
- "Spurious merging detection" is now performed by running the subclustering procedure for each motif, immediately followed by the "DynamicDistanceSimilarPatternsCollapser" procedure on the submotifs. Thus, DynamicDistanceSimilarPatternsCollapser is run in two places: during the spurious merging detection and during the overall redundancy reduction step. This substantially increases the diversity of motifs returned (contrast the above results with previous 0.5.13.0 results at: https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/b3b4d7b240b8e398597100581ae791eec0a13b61/bpnet/trial1/TryBpNet_v0.5.13.0.ipynb)

However, I'm not fully satisfied with the motif merging criteria which is why this will be an "in dev" release

0.5.13.2

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/87

I had changed the API a bit so that the user didn't have to specify the number of threads in two different places (once for the main tfmodisco workflow and once for the advanced gapped kmer embdders) - however, the API change that I made caused an error with the old gapped kmer embedder.

I also updated the gkmexplain example notebooks to use the new "advanced" gapped kmer embedders. To recap: the old "gapped kmer" embedding method was looking at all gapped kmers of a fixed "word length", and became very memory inefficient when this word length was increased; the new gapped kmer method (which I refer to as "advanced gapped kmer embeddings") can look at much wider gapped kmer lengths while remaining memory efficient.

0.5.13.1

Corresponds to PR https://github.com/kundajelab/tfmodisco/pull/86

- Package installation was broken because run_leiden wasn't listed under scripts
- Number of threads used for Agkm embedding had to be specified separately from the number of threads used elsewhere, leading to the default number of threads being different than the user-specified number. This has been fixed.

Page 2 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.