Cazy-webscraper

Latest version: v2.3.0.3

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

2.3.0.3

Patch for incomplete NCBI reads

As flagged in issue 120, if the connection to NCBI is interrupted or terminated early an incomplete or corrupted read error is raised. `try/except` blocks were updated to accept these incomplete read errors, and `cazy_webscraper` will now re-try the connection until either a successful connection is made, or the number of reattempts is reached (which ever is achieved first).

What's Changed
* Issue 120 ncbi by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/126


**Full Changelog**: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.3.0.2...v2.3.0.3

2.3.0.2

Minor patch

**Bug Fix**
Fixes crashing when retrieving the latest taxonomy data from NCBI for CAZyme records that are associated with multiple taxa in CAZy.
* Catches and handles RunTime, NotXML and IncompleteRead errors

What's Changed
* Doc update by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/116
* Update config.yml by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/117
* Catch incomplete read error by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/121


**Full Changelog**: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.3.0...v2.3.0.2

2.3.0

* Downloading protein data from UniProt is several magnitudes faster than before - and should have fewer issues with using older version of `bioservices`
- Uses `bioservices` mapping to map directly from NCBI protein version accession to UniProt
- `cw_get_uniprot_data` not longer calls to NCBI and thus no longer requires an email address as a positional argument
* Updated database schema: Changed `Genbanks 1--* Uniprots` to `Genbanks *--1 Uniprots`. `Uniprots.uniprot_id` is now listed in the `Genbanks` table, instead of listing `Genbanks.genbank_id` in the `Uniprots` table

* Retrieve taxonomic classifications from UniProt
* Use the `--taxonomy`/`-t` flag to retrieve the scientific name (genus and species) for proteins of interest
* Adds downloaded taxonomic information to the `UniprotsTaxs` table

* Improved clarrification of deleting old records when using `cw_get_uniprot_data`
- Separate arguments to delete Genbanks-EC number and Genbanks-PDB accession relationships that are no longer listed in UniProt for those proteins in the local CAZyme database for proteins whom data is downloaded from UniProt
- New args:
- `--delete_old_ec_relationships` = deletes Genbank(protein)-EC number relationships no longer in UniProt
- `--delete_old_ecs` = deletes EC numbers in the local db not linked to any proteins
- `--delete_old_pdb_relationships` = deletes Genbank(protein)-PDB relationships no longer in UniProt
- `--delete_old_pdbs` = deletes PDB accessions in the local db not linked to any proteins

* Retrieve the local db schema
- New command `cw_get_db_schema` added.
- Retrieves the SQLite schema of a local CAZyme database and prints it to the terminal

* Added option to skip retrieving the latest taxonomic classifications NCBI taxonomies
- By default, when retreiving data from CAZy, `cazy_webscraper` retrieves the latest taxonomic classifications for proteins listed under multiple tax
- To increase scrapping time, and to reduce burden on the NCBI-Entrez server, if this data is not needed (e.g. GTDB taxs will be use) this step can be skipped by using the new `--skip_ncbi_tax` flag.
- When skipping retrieval of the latest taxa classifications from NCBI, `cazy_webscraper` will add the first taxa retrieved from CAZy for those proteins listed under mutliple taxa

2.2.8

Bugs and improvements
* Addresses issue of incomplete retrieval of taxonomy data from NCBI
* Process of retrieving taxonomy data is faster
* PR 113

What's Changed
* add not on cw_get_uniprot before cw_get_pdb by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/112
* Add batch ncbi tax by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/113


**Full Changelog**: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.2.7...v2.2.8

2.2.7

Fixing bugs when downloading seqs from NCBI:

* Issue 109
* Adding missing args to func calls
* Accept UniProt-style accessions and non-standard NCBI accession formats that are used by NCBI
* Combine cached seqs with recently downloaded so don't need to manually combine multiple caches if the download is interrupted multiple times
* Remove unused args from func returns

What's Changed
* Issue 109 by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/110


**Full Changelog**: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.2.6...v2.2.7

2.2.6

Fix `cazy_webscraper` crashing from missing arguments to function calls when retrieving the latest taxonomic classifications for proteins when a batch on protein IDs contains an invalid ID:


Traceback (most recent call last):
File "...bin/cazy_webscraper", line 33, in <module>
sys.exit(load_entry_point('cazy-webscraper', 'console_scripts', 'cazy_webscraper')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../cazy_webscraper/cazy_scraper.py", line 246, in main
get_cazy_data(
File "...//cazy_webscraper/cazy_scraper.py", line 355, in get_cazy_data cazy_data, successful_replacement = replace_multiple_tax(
^^^^^^^^^^^^^^^^^^^^^
File ".../cazy_webscraper/ncbi/taxonomy/multiple_taxa.py", line 135, in replace_multiple_tax
cazy_data, success = replace_multiple_tax_with_invalid_ids(cazy_data, args)


What's Changed
* Fix tax invalid ids by HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/108


**Full Changelog**: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.2.5...v2.2.6

Page 1 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.