This is the initial release. Interesting features include:
- Good object representation for short genetic variants (SNP and Indel classes). The variants in a given genomic region can easily be initialized using the utilities module.
- Region objects support basic set-like operations. They make it easy to find physical overlaps.
- Sequence objects provide some basic sequence analysis functionality. They can be created from the reference human genome through the "Reference" object. The reference can either be local (fasta format) or remote (Ensembl API).
- Gene and Transcript objects facilitate information gathering from the web. They allow the user to use HGNC gene symbols for object initialization. They make it easy to find orthologs or cross-references.
- Formats module. The most important part is probably the fairly efficient Impute2 file parser.
- The db module facilitates access to Ensembl (REST API), UCSC (MySQL database) and APPRIS.
- There is also a set of features for efficient file indexing. This is supported for any delimited text file with chromosome and position columns.