Cogent3

Latest version: v2024.12.19a2

Safety actively analyzes 701868 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 4

2024.2.5a1

Contributors

- fredjaya, documentation, bug fixes and tests. Thanks fredjaya 🚀!!
- KatherineCaley and khiron, maintenance 🛠️. Thanks KatherineCaley and khiron 🏎️!
- GavinHuttley, miscellaneous 🍥

ENH

- Now support python3.12 🚀
- Pin numpy version to < 2 until we can test against released version.
- AnnotationDb.subset() method. Returns a new instance matching provided
conditions.
- SeqView.parent_start, SeqView.parent_stop properties report position
on original sequence. These help keep track of the segment on original
after slicing operations. They always **return** plus strand orientation.
- SeqView offset argument added to constructor
- SeqView.copy() method. Supports slicing the original data (and
recording the offset).
- Sequence.parent_coordinates() returns parent_start, parent_stop, strand
values from underlying SeqView. strand is +/- 1, which is relative to
the original sequence.
- AnnotationDb methods now expect the strand argument to have value "+"/"-"
or None.
- make_seq() returns a seq as is if it's already the correct molecular
type. This preserves the AnnotationDb attribute.

BUG

- Fixed ArrayAlignment.get_degapped_relative_to(). It was returning the
transpose of the alignment.

DOC

- Improvements to docstrings for some cogent3.apps. We have added code
snippets that can be copy/paste into a python session.
- Major updates to the developer docs to guide new contributors 🎉! Check
them out [at the c3dev wiki](https://github.com/cogent3/c3dev/wiki).

Deprecations

- We drop support for python3.8
- Assorted arguments marked for deprecation

<a id='changelog-2023.12.15a1'></a>

2023.12.15a1

Contributors

- wjjmjh (aka Stephen Ma) fixed an error in the seq features gallery drawing example. Turns out, if you want a thumbnail, you actually have to write it out 🤦‍♂️!!
- Multiple PRs from first-time contributor Fred Jaya. Thanks fredjaya 🚀!!
- Kath Caley for the sweet new cogent3 logo 🥳!
- Kath Caley for numerous other commits (bug fixes, enhancements, etc..). Thanks KatherineCaley 🚀!
- First time contribution from cosmastech 🎉
- Important new tree metrics from Robert McArthur. Thanks rmcar17 🚀!

ENH

- Added from cogent3.app.typing import defined_types function. This displays
the cogent3 defined type hints and the objects they represent.

- added available_apps(name_filter) which allows filtering apps by
their name (thanks to cosmastech)

- The new AnnotationDb.subset() method creates a subset of the instance
with records matching the provided conditions.

- app.as_completed() and app.apply_to() no longer raise a
ValueError if there's no work to be done.

- Added the ever popular Robinson-Fould tree topology measure, but
you shouldn't use it except for comparison to the far superior
(statistically at least) matching distance measures. These are all
available on the tree objects via a new PhyloNode.tree_distance()
method (thanks to rmcar17).

- Added methods `min_pair()` and `max_pair()` to `DistanceMatrix`, to return the names of the sequences with minimum and maximum pairwise distance, respectively (thanks to KatherineCaley).

BUG

- The sequence returned from Alignment.get_seq() now reflects
slice operations applied to the Alignment.

DOC

- The sequence features drawing example now has a gallery thumbnail.

- Doc showing how the dotplot can be used to highlight alignment errors.

- Added acknowledgement that this project has received funding support from
the Australian National University. We are grateful!

Deprecations

- Table.tolist() is being replaced by Table.to_list()

- Reverse slicing of Alignment and ArrayAlignment are now consistent
with reverse slicing of a string. Previously reverse slicing an
Alignment instance, e.g. `aln[5:1]` would reverse complement a
nucleic acid object, but fail if it was any other molecular type.
This behaviour was different to ArrayAlignment. For both objects, use
a normal slice and reverse complement, e.g. `aln[1:5].rc()`.

Discontinued

- Began a major refactor of the sequence collection classes. The major change
is they no longer accept fasta formatted string as input. As we have simplified
the data conversion functions, all previously used public functions have now been
marked as being discontinued 2024.3.

<a id='changelog-2023.9.22a1'></a>

2023.9.22a1

Contributors

- YapengLang for reporting the bug in computing the ENS for
non-stationary processes.

ENH

- MinimalFastaParser now removes internal spaces
in sequence blocks

- Added app.evo.model(tree_func: Optional[Callable]) argument. A callback
function assigned to tree_func is passed an alignment and returns a tree.
This could be done via loading a tree from disk for that alignment source,
or estimation of the tree using the alignment. The argument overrides the
other tree related arguments (tree, unique_trees).

BUG

- The calculation of the expected number of substitutions on a tree branch
was incorrect. It was not using the motif probs from the parent node.

- Degapped sequence and collections now retain features

- Fixed issue 993. We provide a new default_length argument to
LF.make_likelihood_function() to be applied when a provided tree
has zero (or no) lengths. This is set to be 1.

<a id='changelog-2023.7.18a1'></a>

2023.7.18a1

Contributers

- Richard Morris
- AoboHill made their first contribution!
- pithirat-horvichien made their first contribution!
- First contributions from Robert McArthur and Vijini Mallawaarachchi!
- Kath Caley, massive contributions to the sequence annotation refactor And the documentation and ...!
- Thanks to Dr Minh Bui, author of IQ-tree, for a sample extended newick output
from IQ-tree!

ENH

- Added automated changelog management to the project
- serializable deprecation of function and method arguments using decorator
- Added `SequenceCollection.distance_matrix()` method. This method provides
a mechanism for k-mer based approximation of genetic distances
between sequence pairs. It is applicable only to DNA or RNA moltypes.
Sequences are converted into 10-mers and the Jaccard distance is computed
between them. This distance is converted into an estimate of a proportion
distance using a 10-th degree polynomial. (That polynomial was derived from
regression to distances from 116 mammal alignments.) The final step is applying
JC69 to these approximated proportion distances.

- Robert and Vijini added a function for computing the matching distance
statistic between tree toplogies.

`from cogent3.phylo.tree_distance.lin_rajan_moret`

This is a better behaved statistic than Robinson-Foulds. The original
author was Dr Yu Lin who tragically passed away in 2022. He was our
dear friend, colleague and mentor.

Lin et al. 2012 "A Metric for Phylogenetic Trees Based on Matching"
IEEE/ACM Transactions on Computational Biology and Bioinformatics
vol. 9, no. 4, pp. 1014-1022, July-Aug. 2012

- Major rewrite of annotation handling! In short, we now use an in-memory SQlite
database to store all annotations from data sources such as GFF or GenBank. New
classes provide an interface to this database to support adding, querying records
that match certain conditions. The new database is added to `Sequence` or `Alignment`
instances on a `.annotation_db` attribute. When sequences are part of a collection
(`SequenceCollection` or `Alignment`) they share the same data base. Features are now
created on demand via the `Sequence` or `Alignment` instances and behave much as the
original `_Annotatable` subclasses did. There are notable exception to this, as
outlined in the deprecated and discontinued sections.
This approach brings a massive performance boost in terms of both speed and memory
A microbial genome sequence and all it's annotates can be loaded in less than a second.
- A new `cogent3.load_annotations()` function allows loading an annotation
db instance from one, or more, flatfiles. If you provide an existing annotation
db instance, the records are added to that db.

- Capture extended newick formatted node data. This is stored in
`TreeNode.params["other"]` as a raw string.

- The `tree_align()` function now uses new approximation method for faster
estimation of distances for a obtaining guide tree. This is controlled by
the `approx_dists` argument. The additional argument `iters` can be used to
do multiple iterations, using genetic distances computed from the alignment
produced by the preceding iteration to generate the new guide tree.

If `approx_dists` is `False`, or the moltype of chosen model is not a nucleic acid
compatible type, distances are computed by the slower method of performing
all pairwise alignments first to estimate the distances.

- Added new alignment quality measures as apps, and the ability to invoke them
from the `Alignment.alignment_quality()` method. The new apps are the
Information content measure of Hertz and Stormo (denoted `ic_score`), a
variant on the the sum of pairs measure of Carillo and Lipman
(denoted `sp_score`), and the log-liklelihood produced by the cogent3
progressive-HMM aligner (denoted `cogent3_score`). If these apps cannot
compute a score (e.g. the alignment has only 1 sequence), the return a
`NotCompleted` instance. Instances of that class evaluates to `False`.

- Added optional argument `lower` to `app.model()`. This provides a global
mechanism for setting the lower bound on rate and length parameters.

- `load_unaligned_seqs()` now handles glob patterns. If the filename is a glob
pattern, assumes a listing of files containing a single sequence. The `load_seq()`
function is applied to each file and a single `SequenceCollection` is returned. To
see progress, set `show_progress=True`.

- `Table.joined(col_prefix)` argument allows users to specify the prefix of
columns corresponding to the second table, i.e.
`result = table.inner_join(table2, col_prefix="")`
ensures result.header is the sum of table.header and table2.header
(minus the index column).

- Added `trim_stop` argument to `get_translation()` methods. This means
translating DNA to protein can be done with one method call, instead of
two.

DOC

- Thanks to Katherine Caley for awesome new docs describing the
new feature and annotation DB capabilities!

Deprecations

- Removed the original Composable app classes and related decorators for
user based apps. `user_function` and `appify` are replaced by the
`define_app` decorator.

- The function TreeAlign is to be deprecated in 2023.9 and replaced with tree_align

- Every method that has "annotation" in it is now deprecated with a replacement
indicated by their deprecation warnings. Typically, there's a new method with the
name "feature" in it.

- `<collection>.has_terminal_stops()` is being deprecated for
`<collection>.has_terminal_stop()`, because it returns True if a single
sequence has a terminal stop.

Discontinued

- Removed methods on `TreeNode` that are a recursion variant of an
existing methods. `TreeNode.copy_recursive()`, `TreeNode.traverse_recursive()`
`TreeNode.get_newick_recursive()` all have standard implementations that can
be used instead.
- `PhyloNode` inherits from `TreeNode` and is distinguished from it only by
have a length attribute on nodes. All methods that rely on length
have now been moved to `PhyloNode`. These methods are `PhyloNode.get_distances()`,
`PhyloNode.set_max_tip_tip_distance()`, `PhyloNode.get_max_tip_tip_distance()`,
`PhyloNode.max_tip_tip_distance()`, `PhyloNode.compare_by_tip_distances()`.
One exception is `TreeNode.get_newick()`. When `with_distance=True`, this
method grabs the "length" attribute from nodes.
- All methods that do not depend on the length attribute are moved to `TreeNode`.
These methods are `TreeNode.balanced()`, `TreeNode.same_topology()`,
`TreeNode.unrooted_deepcopy()`, `TreeNode.unrooted()`, `TreeNode.rooted_at()`,
`TreeNode.rooted_with_tip()`.

- The `SequenceCollection.annotate_from_gff()` method now accept file
paths only.

- Renaming a sequence in a sequence collection is not applied
to annotations. Users need to modify names prior to binding
annotations.

- Dropping support for attaching / detaching individual annotation
instances from an alignment.

- Backwards incompatible change! `Sequence` and `Alignment` no longer inherit from
`_Annotatable`, so the methods and attributes from that mixin class are no longer
available. (As there was no migration strategy, please let us know if it broke
your code and need help in updating it.)

Major differences include: the `.annotations` attribute is gone; individual
annotations can no longer be copied; annotations are not updated on sequence
operations (you need to re-query).

<a id='changelog-2023.2.12a1'></a>

2023.2.12a1

Contributors

- Gavin Huttley
- Katherine Caley
- Nick Shahmaras
- Richard Morris

Thanks to dgslos who raised the issue regarding IUPAC consensus. Thanks to users active on the GitHub Discussions!

Enhancements

- get_object_provenance() now allows builtins
- jaccard() distance measure added and older approach deprecated

Composable apps

- app_help() and get_app() available as top-level imports.
- app_help() takes the name of the app as a string and displays its summary, parameters and how to create one using get_app().
- get_app() creates an app instance given its name as a string and constructor arguments.
- added skip_not_completed keyword parameter to define_app decorator.
- Some apps need to process NotCompleted instances. The current `app.__call__` method returns these instances immediately without passing through to the apps `.main()` method. The change introduces a semi-private attribute `_skip_not_completed` to the class. If it's False, the instance will be passed to `main()`.
- composable data validation now allows NotCompleted
- if <app>.input returned a NotCompleted, it was being treated as an invalid data type rather than preserving the original cause for failure. The data validation method now immediately returns a provided NotCompleted instance
- add argument id_from_source to all writer apps for a naming callback
- It should be a callable that generates a unique ID from input data
- Defaults to new get_unique_id() function, which extracts base name by calling get_data_source() and processing the result, removing file suffixes identified by get_format_suffixes().
- this means filename suffixes are dropped more cleanly
- new app to_primitive(). This uses a to_rich_dict() method if available otherwise it just returns the original object.
- new app from_primitive(). This takes a dict and deserialises the object using the standard cogent3 functions.
- new app pickle_it(). Does as the name implies.
- new app unpickle_it(). Does as the name implies.
- new app compress(). Compresses using a provided compress function. Defaults to gzip compress.
- new app decompress(). Deompresses using a provided decompress function. Defaults to gzip decompress.
- new app to_json(). Converts result of to_primitive() to json string.
- new app from_json(). Converts json string to python primitives suitable for from_primitive().
- added DEFAULT_SERIALISER and a corresponding DEFAULT_DESERIALISER app instances
- these are to_primitive() + pickle_it() (and the reverse)
- app.typing.get_constraint_names() now supports all standard python Sequence built-ins (list, tuple, set).
- add type resolver for nested types
- function resolves the type tree of nested types and also returns the depth of that type tree
- ensure custom apps don't have excessive nested types. The motivation for this check is it is difficult to efficiently resolve, so we advise the developer (via a TypeError message) to define a custom class for such complex types. They can then choose to validate construction of those class attributes themselves.

DataStores

These have been completely rewritten and have different behaviour from the original versions. Deprecation warnings are raised when the old ones are employed.

- Loading and creating data stores should now be done using open_data_store(), a top-level import.
- It replaces the (now deprecated) get_data_store() function.
- It adds a mode argument, "r" is read only, "w" write, and "a" append. This function should now be used for all creation of new data store instances.
- Supports opening in-memory sqlitedb for writing, just use ":memory:" as the data_path. If mode is read only, raises a NotImplementedError.
- added new DataStoreSqlite for a more flexibile data store backed
- supports all python types via pickling, including compression of that data
- is part of the standard library
- uses the new DEFAULT_SERIALISER for serialisation. The corresponding DEFAULT_DESERIALISER can be used for reversing that.
- specified using the suffix ".sqlitedb" or using ":memory:" for an in memory sqlitedb
- record_type property is the type of completed records
- DataStore's have completed and not_completed properties
- Iteration on data stores is across *both* of those
- Iterate over the completed property for subsequent analyses
- DataStore's have drop_not_completed method
- All data stores record NotCompleted and md5 data
- DataStore's have a .validate() method, which checks all records match their recorded md5.
- DataStore's provide separate methods for writing different types
- write, write_not_completed, write_log
- all require keyword style arguments
- all return a DataMember
- DataStoreABC.validate() now records missing md5
- DataStores now have a summary_not_completed property
- repr(DataStore) now displays the construction statement. The str(DataStore) returns the output previously displayed by repr().

Alignments

- iupac_consensus() method now allows ignoring gaps using the allow_gaps argument.

Deprecations

- get_data_store -> open_data_store
- all previous data store, data member, writer, loader classes
- Data store summary_incomplete property is renamed summary_not_completed

Discontinued

- We are discontinuing support for tinydb.
- added convert_tinydb_to_sqlite() function for converting old tinydb to sqlitedb
- adds a log recording the conversion
- All previous data store types are discontinued, use open_data_store() function for getting a data store instead of a direct import path.

BUG

- progress display in notebooks now works again

2022.8.24a1

Contributors

Thanks to our contributors!

Accepted PRs from

- Gavin Huttley
- KatherineCaley
- Nick Shahmaras
- Xingjian Leng

Identified a Bug

- StephenRogers1

ENH

Significant refactor of composable apps. This is the backwards incompatible change we warned of in the last release! We now use a decorator `define_app` (on classes or functions) instead of class inheritance. Please see [the c3dev wiki](https://github.com/cogent3/cogent3/wiki/composable-functions) for examples on how to port from old-style to new-style composable apps.

We updated to the latest NCBI versions of genetic codes. Note, the name of genetic code 1 has changed from "Standard Nuclear" to "Standard".

BUG

- Fix progressive alignment bug when a guide-tree with zero edge lengths was encountered.
- Non-stationary independent tuple models can now be serialised.

DEP

- We have removed support for python 3.7.
- We have made scipy a dependency and begun deprecating statistical functions that are available in scipy. All deprecated functions have a warning that indicates the scipy replacement. The deprecated functions are: combinations, chi_high, chdtri, z_high, z_low function, chi_low, binomial_high, binomial_low, f_high, f_low, t_low and t_high.

Page 3 of 4

Releases

Has known vulnerabilities

Previous Next

Cogent3

Page 3 of 4

2024.2.5a1

2023.12.15a1

2023.9.22a1

2023.7.18a1

2023.2.12a1

2022.8.24a1

Page 3 of 4

Links

Releases