- Group-Splitter:
- Replaced calculate similarity with Levenshtein difference.
- Can be calculated using the Levenshtein library or fallback to much slower python implementation.
- Fixed handling of AA/DNA clustering output of CD-HIT.
- Calculating the representative sequence for each new
subgroup now normalises length and pident.
- If reclustering of a Group results in multiple CD-HIT clusters, each cluster will be processed separately. It is therefore important to understand reclustering options.
- Added option to process only user-defined Groups or 'auto' to detect which groups to subgroup.
- Added required user-parameter to state the number of genomes (or genera) in analysis.
- Fixed some minor bugs and cleaned up output handling.
- Fixed a bug where the total_genomes was being calculated on a per-cluster basis which was naive. User must now provide the number of genomes in the analysis.
- Added option to not delete temp files.
- Cleaned up some user-parameters to match those used in CD-HIT.
- General:
- A number of general bug fixes, user-menu improvements, added output during 'verbose' mode and code-cleanup.
- Cluster-Summary:
- A new sub-tool that summarises CD-HIT .clstr files.