Esmecata

Latest version: v0.5.4

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

0.5.4

Fix

* Issue with proteomes from UniParc during clustering (incorrect split on '|').
* Issue in test with updated taxonomic group.

0.5.3

Fix

* Handle an issue with requests.exceptions.ChunkedEncodingError.

Modify

* Remove unused header in output file of gseapy.

0.5.2

Add

* New way to search proteomes by using UniParc. Some proteomes, when downloaded directly from UniProt are empty. A solution is to search for them in UniParc and retrieved the associated protein sequences.
* New plot in report showing proteomes according to tax_rank.
* Database number version when creating precomputed database.
* The possibility to give a file containing manually selected groups of observation names for `esmecata_gseapy gseapy_enrichr`.
* Tests for `esmecata_gseapy gseapy_enrichr`.

Fix

* Issue in creating heatmap of proteomes (missing taxon rank) in report creation.
* Issue when creating database: there was a possibility that a taxon without consensus proteomes and associated annotations was kept.

Modify

* Update parameter description for SPARQL option to indicate the value to query SPARQL UniProt Endpoint.
* Rename `esmecata_gseapy gseapy_taxon` into `esmecata_gseapy gseapy_enrichr` to reflect the changes in the command.
* Modify how `esmecata_gseapy gseapy_enrichr` works by adding a grouping parameters allowing to choose either groups according to taxon_rank or with a file created by the user and containing manually selected groups of observation names.
* Update readme according to the different changes made in this release.

TODO

* Investigate and solve memory leak when mapping UniParc IDs to UniProt with bioservices.
* Add handling of UniParc IDs with SPARQL queries.

0.5.1

Add

* Metadata file for `esmecata_gseapy gseapy_taxon`.

Fix

* Several issues in `esmecata_gseapy gseapy_taxon`.
* Issues in tests due to UniProt updates.

Modify

* Update metadata files for annotation and eggnog by adding missing dependencies.

0.5.0

**WARNING**: Changes in the structure of the python package of EsMeCaTa.
If you have been importing the package in Python, you will need to modify your import.

Add

* New command `esmecata_report` to create a report from the output folder of EsMeCaTa. Scripts of `esmecata_report` allow to create `html`, `pdf` and `tsv` reports from EsMeCaTa (work of Mataivic and PaulineGHG). This command has several subcommands:
* (1) `create_report` to create a report from the output folder of the `esmecata workflow` subcommand (only way to have the complete HTML report).
* (2) `create_report_proteomes` to create report files from output of `esmecata proteomes` subcommand.
* (3) `create_report_clustering` to create report files from output of `esmecata clustering` subcommand.
* (4) `create_report_annotation` to create report files from output of `esmecata annotation` subcommand.
* New command `esmecata_gseapy` to create enrichment analysis of functions predicted by EsMeCaTa according to taxon rank.
* New optional dependencies required for `esmecata_report`: [datapane](https://github.com/datapane/datapane), [plotly](https://github.com/plotly/plotly.py), [kaleido](https://github.com/plotly/Kaleido), [ontosunburst](https://github.com/AuReMe/Ontology_sunburst). As `datapane` is no more maintained, an alternative with [panel](https://github.com/holoviz/panel) is currently developed.
* New optional dependencies required for `esmecata_gseapy`: [gseapy](https://github.com/zqfang/GSEApy) and [orsum](https://github.com/ozanozisik/orsum).
* New file indicating the EC numbers and GO Terms for the different observation name of the dataset (file `function_table.tsv`).
* New subcommand `esmecata precomputed`. This subcommand uses a precomputed database to make predictions from the input file (using EsMeCaTa default parameters). It has been added to avoid creating the same prediction every run and to have a fast way to make predictions with EsMeCaTa. It is necessary to download the precompiled database before using it. At the moment of this release, the database is not available, these scripts are present for testing purposes.
* Prototype for precomputed database creation: several scripts are added in `esmecata/precomputed` folder to create the input and the precomputed database.
* Check that the proteome files are not completely empty, which could cause problems with mmseqs2.
* Tests for precomputed database, report creation, database creation and eggnog annotation. Add mock on sevral functions to perform the test. Required `pytest-mock`.
* Add readme in test folder.

Fix

* Issue in proteomes SPARQL query (missing `PREFIX`).

Modify

* Modification of the structure of the EsMeCaTa package, now divided into 4 main folders: (1) `esmecata/core` (for the script previously contained in the EsMeCaTa folder) and used for the workflow, (2) `esmecata/report` to generate a report from the esmecata output folder, (3) `esmecata/gseapy` to perform enrichment analysis on the esmecata output, and (4) `esmecata/precomputed` to create precomputed database (in development).
* Change the name of intermediary files in `clustering` and `annotation` to avoid issues with ambiguous taxon names.
* Modify test according to changes of packaging structure.
* Modify the behaviour of annotation by eggnog-mapper. Now it merges protein sequences from clustering into bigger files (associated with superkingdom). This increases the performance of eggnog-mapper. Modification made with megyl. Use `--tax_scope` with eggnog-mapper.
* Update article_data folder.
* Update CI tests of github workflow according to the new tests and the new dependencies.

Remove

* Remove `esmecata analysis` subcommand as it was not used and not very useful.

0.4.2

Fix

* Issue in `pyproject.toml` and how the package is installed with `pip`.

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.