This is a big release wherein we introduce our model repository and do a functional refactoring of our `tl.Crested` class.
Features
- new `crested.get_model` function to fetch models from crested model repository
- new `enformer` and `borzoi` model options in the crested zoo, as well as scripts for converting weights to keras format.
- new cut site bigwigs option for mouse cortex dataset
- new crested model repository in the readthedocs with model descriptions.
- new pattern clustering plot `pl.patterns.clustermap_with_pwm_logos` that shows PWM logo plots below the heatmap.
- Extra parameters options in modisco calculations and plotting regarding allowed seqlets per cluster and top_n_regions selection.
- option to color lines in gene locus scoring plotting
- extra ylim option in `crested.pl.bar.prediction`
- gene locus scoring plotting improvements
Bugfixes
- importing bigwigs now correctly accounts for regions that were removed due to chromsizes
Notebooks
- Rewrote the tutorials to use the new functional API (WIP).
- Expanded on the enhancer design section
Functional Refactor of crested.tl.Crested(...) class
In this large refactor we're moving everything from the Crested class that does not use both a model and the AnnDatamodule out to a new \_tools.py module where everything will be functional.
All the old functions remain in the Crested class for backward compatibility (for now) but will now raise a deprecation warning.
We're giving up a bit of clarity for ease of use by combining functions that do the same on different inputs into one single function.
Equivalent new functions
- `tl.Crested.get_embeddings(...)` ---> `tl.extract_layer_embeddings(...)`
- `tl.Crested.predict(...)` --->` tl.predict(...)`
- `tl.Crested.predict_regions(...)` ---> `tl.predict(...)`
- `tl.Crested.predict_sequence(...)` ---> `tl.predict(...)`
- `tl.Crested.score_gene_locus(...)` ---> `tl.score_gene_locus(...)`
- `tl.Crested.calculate_contribution_scores(...)` ---> `tl.contribution_scores(...)`
- `tl.Crested.calculate_contribution_scores_regions(...)` ---> `tl.contribution_scores(...)`
- `tl.Crested.calculate_contribution_scores_sequence(...)` ---> `tl.contribution_scores(...)`
- `tl.Crested.calculate_contribution_scores_enhancer_design(...)` ---> `tl.contribution_scores(...)`
- `tl.Crested.tfmodisco_calculate_and_save_contribution_scores_sequences` ---> `tl.contribution_scores_specific(...)`
- `tl.Crested.tfmodisco_calculate_and_save_contribution_scores` ---> `tl.contribution_scores(...)`
- `tl.Crested.enhancer_design_motif_implementation` ---> `tl.enhancer_design_motif_insertion`
- `tl.Crested.enhancer_design_in_silico_evolution` ---> `tl.enhancer_design_in_silico_evolution`
New functions
Some utility functions were hidden inside the Crested class but required to be made explicit.
- `utils.calculate_nucleotide_distribution` (advised for enhancer design)
- `utils.derive_intermediate_sequences` (required for inspecting intermediate results from enhancer design)
New behaviour
- All functions that accept a model can now also accept lists of models, in which case the results will be averaged across models.
- All functions use a similar api, namely they expect some 'input' that can be converted to a one hot encoding (sequences, region names, anndatas with region names), but now the conversion happens behind the scenes so the user doesn't have to worry about this and we don't have a separate function per input format.