Performance upgrades
- Mutation data is now processed and saved in a sparse format that tracks only the mutated positions. Since mutations make up only 1 - 5% of most datasets, the sparse format is more storage-, memory-, and time-efficient.
- Batches are now saved in Brotli-compressed pickle ("Brickle") files, which requires less storage and allows more types of data to be saved than the previous Parquet and gzip-compressed CSV formats.
New features
- The `table` step has been sped up via the sparse data format, and computing all fields is nearly as fast as computing one field. Thus `table` now computes all fields automatically (the option to compute select fields has been removed).
Bug fixes
- When running `align` on demultiplexed FASTQ files, one report file is now generated for each FASTQ file, rather than all FASTQ files for each sample writing to and overwriting one report file.
- When running `relate` on multiple samples that are aligned to the same set of references, every BAM/CRAM file from every sample is processed instead of only one sample BAM/CRAM file for each reference.
- When running `fold`, misformatting of the RNAstructure `Fold` command has been fixed.
Internals
- The `core` modules have been refactored into a group of subpackages, each with their own modules.
- The `all` subcommand has been moved from the `main.py` module to its own subpackage.
- The mutation calling and counting routines in the modules `seismicrna.core.bitcall` and `seismicrna.core.bitvect`, respectively, have been rewritten and replaced with `seismicrna.core.rel.pattern` and `seismicrna.core.batch.accum`.
- The unique read finding algorithm has likewise been rewritten and moved to `seismicrna.cluster.uniq`.
**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.8.0...v0.9.0