Pyamilyseq

Latest version: v1.0.1

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.9.0

- Group-Splitter:
- Replaced calculate similarity with Levenshtein difference.
- Can be calculated using the Levenshtein library or fallback to much slower python implementation.

- Fixed handling of AA/DNA clustering output of CD-HIT.

- Calculating the representative sequence for each new
subgroup now normalises length and pident.

- If reclustering of a Group results in multiple CD-HIT clusters, each cluster will be processed separately. It is therefore important to understand reclustering options.

- Added option to process only user-defined Groups or 'auto' to detect which groups to subgroup.

- Added required user-parameter to state the number of genomes (or genera) in analysis.

- Fixed some minor bugs and cleaned up output handling.

- Fixed a bug where the total_genomes was being calculated on a per-cluster basis which was naive. User must now provide the number of genomes in the analysis.

- Added option to not delete temp files.

- Cleaned up some user-parameters to match those used in CD-HIT.

- General:
- A number of general bug fixes, user-menu improvements, added output during 'verbose' mode and code-cleanup.

- Cluster-Summary:
- A new sub-tool that summarises CD-HIT .clstr files.

0.8.1

Version v0.8.1 includes small but important fixes to the core-gene alignment system and a few user option improvements.

- Group-Splitter now handles CD-HIT .clstr files correctly as there are slight differences between cd-hit and cd-hit-est (aa/dna) in how the pident's are reported.
- Group-Splitter now requires the user to state whether the input is DNA or AA.
- There was a bug where sometimes 'all' FASTA files are included in the core-gene alignment resulting in very odd alignments.
- User options in PyamilySeq now take the -t THREADS parameter and passes it to CD-HIT and MAFFT.

0.8.0

This major release has two main additions but mainly focuses on a new 'subtool' to help process and investigate potential paralogs that have been collapsed into single gene families/groups. More details will be provided in the future with examples.

- Gene_Presence_Absence.csv outputted as Default: The Gene_Presence_Absence file is now automatically generated, making it easier to assess gene distribution across genomes. - This was a suggestion from ecampbell50 2

- Group-Splitter Addition: New 'subtool' to split "paralogous" groups from clustering results, improving handling of gene families with multiple paralogs.

0.7.0

Most changes are reflected in the user menu and a number of bugs were caught that should now result in the correct recording on Seconds when more than one First was clustered in the reclustering stage.

0.6.0

This is a major release that has undergone another large rewrite to reduce redundancy of code, fix a number of bugs and re-add back in the Genus mode.

- PyamilySeq now as default outputs a 'summary_statistics.txt' file in a similar output to what Panaroo and Roary does.


-group_mode {Species,Genus}
Group Mode: Should PyamilySeq be run in "Species" or "Genus" mode?

0.5.2

Clustering Runtime Arguments - Optional when "-run_mode Full" is used:
-mem CLUSTERING_MEMORY
Default 4000: Memory to be allocated for clustering (in MBs).
-t CLUSTERING_THREADS
Default 4: Threads to be allocated for clustering.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.