Agalma

Latest version: v1.0.0

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

1.0.0

=====

* Major redesign of how permanent data is stored. Removed 'outdir' parameter
from all pipelines, and permanent data is stored directly in the Agalma
sqlite database file (163). Merged the separate Agalma and BioLite database
files into a single file that can be specified with the environment variable
AGALMA_DB (162). This allows an entire Agalma analysis to be stored in a
single sqlite file that can be relocated during analysis, or copied to a new
sqlite file as the starting point of alternative analyses.
* New import and annotation methods to support amino acid sequences without
accompanying nucleotide sequences, and CDS sequences. (98, 174, 197)
* Removed deprecated phylogeny pipelines `orthologize` and `multalignx` (172)
* Implemented phylogenetically guided assembly with the `treeinform` pipeline.
Briefly, treeinform analyzes genetrees to infer a reassignment of assembled
sequences to gene names in the existing Trinity assemblies. (199, 203)
* Added validity tests for sequence reassignments in treeinform. (223)
* Updated dendropy to latest version 4. (207)
* Updated Trinity to latest version 2.3.2. (188)
* Restored and improved transcriptome reports. (220)
* Replaced notung with phyldog for inferring duplication/speciation events in
'export_expression'. (225)
* Replaced BioBrew installation instructions with prebuilt binary releases for
all dependencies managed with Anaconda Python. Release includes a Docker
image. (227)

0.5.0

=====

* Added a tool 'agalma-export-expression' for exporting a single JSON file with the
gene trees, species tree, and expression counts for a phylogenetic analysis of
gene expression. The JSON file can be imported into R for downstream analyses.
(103, 119, 136)
* Updated 'assemble' to use Trinity version r20140413p1. (127, 77, 91)
* Fixed a bug with incorrect arguments in a GNU parallel in 'postassemble'. (135)
* Added a '--min_nodes' argument to 'homologize' for adjusting the minimum number
of nodes retained as a homolgous gene cluster. (139, contributed by Warren Francis)
* Fixed a bug in 'sanitize' where FastQC failed for gzipped FASTQ files. (129)
* The rRNA exemplars are now annotated with the closest BLAST hit from the set
of curated rRNA. The curated rRNA now require a "OG" field in the header for
mitochondrial or plastid rRNA sequences. Fixed a bug in the 'load' pipeline with
identifying mitochondrial and plastid sequences. (134)
* The links to the final assembly files are now much more conspicuous in the
'postassemble' report. (137)
* Added a '--timeout' argument to the 'genetree' pipeline that sets the maximum
amount of time to spend on estimating an individual gene tree with RAxML. Trees
that can't be estimated within this timeframe are dropped from further analysis,
and the number of failed trees is reported in the diagnostics. (143)
* Update the TUTORIAL with a section on the `expression` pipeline. (140)

0.4.0

=====

* Switched the MAFFT algorithm in 'multalign' from L-INS-i to E-INS-i.
* 'remove_rrna' is now more robust to failed subassemblies and datasets that
contain no ribosomal RNA.
* 'remove_rrna' now excludes only the reads that map to the exemplar rRNA
sequences, instead of to all sequences in the rRNA subassembly. The exmplar
sequences and any rRNA transcripts identified in 'postassemble' are concatenated
into a single rRNA file in the 'postassemble' report. (112)
* Standardized all calls to GNU parallel so that they write out a progress log,
halt on any errors, and rerun failed commands when the pipeline is restarted.
(111)
* The 'supermatrix' pipeline now runs on a single multiple alignment of either
nucleotides or amino acids (instead of both at once) and provides a more detailed
report. (54)
* Switched the build system from GNU autotools to Python distutils. Agalma can
now be installed with pip, the Python package manager. (97)
* The 'randomize' stage was removed from 'sanitize' because it requires too much
memory for large datasets. (118)
* A new pipeline 'speciestree' replaces the second call to the 'genetree'
pipeline to build a maximum-likelihood tree from the final supermatrix. It
supports running RAxML with MPI. The Newick tree is included in a textbox in
the report. (104, 115)
* 'postassemble' now annotates assemblies by blasting the translations against
swissprot with blastp. (114)
* A new regression test is autogenerated from the tutorial. (93)
* Sequences loaded into the Agalma database are classified by genome type
(nuclear, mitochonrdial, plastid) and molecule type (protein-coding, large and
small ribosomal). (106, 117)
* Agalma now includes its own build of the swissprot blast database, with the
organelle field (OG) in the description for idenfitying mitochondrial and
plastid sequences. (3, 106)
* Added a sample batch script that shows how to perform a self-contained
transcriptome assembly on the Oscar compute cluster at Brown.
* Fixed bugs in 'remove_rrna' that caused it to fail when no rRNA is present in
the sample.
* Transitioned over to a new organization of wrappers and workflows in BioLite.
Removed a shim library that was used for resource reporting on older versions
of Linux. (107)
* New 'expression' pipeline maps reads to an assembly and estimates counts.
Additional functionality is planned for the 0.5 release. (69)
* Fixed a problem when relative paths were passed as arguments to Agalma
pipelines. (96)
* All stages of the assemblies are now written to the data directory instead of
scratch. (95)

0.3.5

=====

* Hotfixes to correct errors in TUTORIAL and append the supermatrix FASTA file
to the 'multalign' report.
* New 'supermatrix' pipeline can construct supermatrices by occupancy
proportion. (75)
* New 'multalign' pipeline uses MAFFT instead of MACSE for multiple alignment
of translated protein sequences. The simultaneous alignment and translation
approach originally implemented in Agalma can improve translations by
accommodating frameshifts; however, mistakenly including fairly distant
homologs or erroneous transcripts within clusters can result in overall poor
translations and alignment of clusters. The old multalign pipeline was
renamed 'multalignx' where the 'x' stands for translated multiple alignment,
since MACSE uses nucleotide alignments to infer translations. (79)
* Improved the linkage between the phylogeny pipelines, so that the most recent
and correct type of previous runs are identified by default. A previous run
can be explicitly chosen with the --previous argument (now consistent across
all the pipelines). (85)
* Rewrote the 'assemble' pipeline to subsume the Trinity.pl wrapper script, and
run the various components of Trinity as separate stages within the pipeline.
This provides finer grained resource usage and fixes some problems with
robustness and memory use we were experiencing on our compute cluster. GNU
parallel replaces ParaFly for both the quantify_graph and butterfly stages.
Oases is no longer supported in 'assemble', but additional assemblers could
be added in the future as variants on the 'assemble' pipeline, e.g.
'assemble_oases'. (87)
* The report for the 'supermatrix' report now includes a table of the
percentage of genes present for each taxon. (82)
* The regression tests are taking longer to run (30-40 minutes) and have been
divided up into different levels. The default level (1) now runs in about
(16-cores). Higher levels (2 or 3) provide more complete tests and are
selected with 'agalma test X'. (92)
* Added a histogram of mean quality scores to the 'sanitize' report. (90)

0.3.4

=====

* Improved parallelization of the blastx annotation in 'postassemble'. (53)
* 'homologize' has a new mode for seeding the homology search with an existing
set of genes, such as CEGMA or an previously computed supermatrix. Instead of
performing an all-by-all homology search, transcripts are only aligned
against the seed genes. (56, 59)
* New parameter in 'genetree' to disable bootstrapping or change the threshold
for filtering by mean bootstrap support. (60)
* Added multi-node parallelism to 'multalign' and 'genetree' using GNU
parallel. (58, 61)
* 'postassemble' now performs protein translation (largest open reading frame
with Transdecoder) and transcript quantification (with RSEM). The schema for
the 'sequences' table was updated so that exemplars are now selected as the
transcript with highest abundance in a locus, rather than by the earlier
ad-hoc selection of the longest transcript in the locus. Exemplars are now
chosen in 'homologize' (via 'database.load_seqs') and not in 'postassemble'.
(57, 63)
* New 'orthologize' pipeline provides an alternative phylogeny pipeline that
directly infers orthologs using OMA. (64)
* Sequence reduction plot in the phylogeny report has more detail: added
sequence counts before and after 'homologize.mcl_cluster' and for each filter
applied in 'multalign.refine_clusters'. (70, 71)
* Fixed a mis-calculation in the overlap threshold applied in
'homologize.parse_edges'. (72)

0.3.3

=====

* Added bootstraping to RAxML calls in the 'genetree.genetrees' stage, and a
subsequent filtering stage that removes trees with low mean bootstrap
support. (43)
* Removed the auto-generated report at the end of 'transcriptome' and put the
appropriate report commands in the TUTORIAL. (51)
* Added report commands to the phylogeny section of the TUTORIAL. (50)
* Fixed problems with 'tabular_report' that caused unneccessary rows and empty
table cells. (52)
* A new option '--nreads' for reducing the number of reads that 'sanitize'
outputs. (49)
* Modified 'load' to correctly validate external assemblies with IUPAC
ambiguity codes. (41)

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.