Pharokka

Latest version: v1.7.5

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

1.6.1

------------------

* Fixes a bug that was removing tRNAs from the `.tbl` output format 323.

1.6.0

------------------

* Fixes a variety of bugs (300 `pharokka_proteins.py` crashing if it found VFDB hits, 303 errors in the `.tbl` format, 316 errors with types and where custom HMM dbs had identical scored hits, 317 types and 320 deprecated GC function)
* Adds `--mash_distance` and `--minced_args` as parameters (299 thanks iferres).

1.5.1

------------------

* Fixes `dnaapler` version to `>=0.4.0` with new changes to dnaapler
* Adds `.svg` format output with `pharokka_plotter.py`

1.5.0

------------------

* Adds support for `pyrodigal-gv` implementing `prodigal-gv` as a gene predictor ([pyrodigal-gv](https://github.com/althonos/pyrodigal-gv) and [prodigal-gv](https://github.com/apcamargo/prodigal-gv)). This can be specified with `-g prodigal-gv`.
* Thanks to [althonos](https://github.com/althonos) and [apcamargo](https://github.com/apcamargo) for making this possible, and to [asierFernandezP](https://github.com/asierFernandezP) for raising this as an issue in the first place [here](https://github.com/gbouras13/pharokka/issues/290) in #290.
* Adds checks to determine if your input FASTA has duplicated contig headers from 293 [here](https://github.com/gbouras13/pharokka/issues/293). Thanks [thauptfeld](https://github.com/thauptfeld) for raising this.
* `-g prodigal` and `-g prodigal-gv` should be much faster thanks to multithread support added by the inimitable althonos.
* Genbank format output will be designated with PHG not VRL (following this issue https://github.com/RyanCook94/inphared/issues/22).
* The `_length_gc_cds_density.tsv` and `_cds_final_merged_output.tsv` files now contain the translation table/genetic code for each contig (usually 11 but now not always if you use `pyrodigal-gv`).
* `--skip_mash` flag added to skip finding the closest match for each contig in INPHARED using mash.
* `--skip_extra_annotations` flag added to skip running tRNA-scanSE, MinCED and Aragorn in case you only want CDS predictions and functional annotations.

1.4.1

1.4.0

* More sensitive search for PHROGs using Hidden Markov Models (HMMs) using the amazing [PyHMMER](https://github.com/althonos/pyhmmer). Thanks to althonos for some of the best written and well documented software I have ever used.
* By default, `pharokka` will now run searches using both MMseqs2 (PHROGs, CARD and VFDB) and HMMs (PHROGs). MMseqs2 was kept for PHROGs as it provides more information than the HMM results (e.g. sequence alignment identities & top hit PHROG protein) if it finds a hit.
* `--fast` or `--hmm_only` parameter, which only runs PyHMMER on PHROGs. It will not run MMseqs2 at all on PHROGs, CARD or VFDB. For phage isolates, this will be much faster than v1.3.2, but you will not get CARD or VFDB annotations. For metagenomes, this will be (much) slower though!
* Updated databases as of 23 August 2023. You will need to download the new `pharokka v1.4.0` databases because these now contain PHROG HMM profiles. The VFDB database is now clustered at 50% sequence identity (which speeds up runtime).
* Other changes in the codebase should make `pharokka v1.4.0` run somewhat faster than v1.3.2, even if PyHMMER is not used i.e. `--mmseqs2_only` is specified.
* The print screen and log files are neater and more information rich using loguru. There is also a new `logs` directory containing separate log files for each tool in the pipeline. This is thanks to taking and modifying some code from mbhall88 [tbpore](https://github.com/mbhall88/tbpore).
* `install_databases.py` has been modified to be more robust and somewhat faster. This is thanks to taking ideas and modifying some code from oschwengers [bakta](https://github.com/oschwengers/bakta).
* `--mmseqs2_only` which will essentially run `pharokka` as it was v1.3.2. It is default in meta mode `-m` or `--meta`.
* `pharokka_proteins.py`, which takes an input file of amino acid proteins in FASTA format and runs MMseqs2 (PHROGs, CARD, VFDB) and PyHMMER (PHROGs). See the [proteins documentation](docs/proteins.md) for more details. Thanks to Brady Cress for the idea.
* `--custom_hmm` parameter, which allows for custom HMM profile databases to be used with `pharokka`. Thanks to pck00 for the idea.
* `create_custom_hmm.py` which facilitates the creation of a HMM profile database from multiple sequence alignments. See the [documentation](docs/custom.md) for more details about how to create a compatible HMM profile database.
* `--dnaapler` flag, which automatically detects and reorients your phage to start with the large terminase subunit. For more information, see [dnaapler](https://github.com/gbouras13/dnaapler).
* `--genbank` flag, which allows for genbank format input with `-i`. This will take all (customised) CDS calls in genbank file and PHANOTATE/pyrodigal will not be run. So if you have done manual custom gene curation and want to functionally annotate your customised CDS, this option is recommended. Thanks to pck00 for the idea.
* Fixes to `-c`, which should now work properly with `-g prodigal` (thanks alegione for the fixes).

Page 2 of 6

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.