Seismic-rna

Latest version: v0.21.1

Safety actively analyzes 682487 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 12

0.19.0

New Features

Strand-aware alignment

- Each BAM file can be split into separate files of reads originating from the plus- versus the minus-strand of the RNA using `--sep-strands`.
- The option `--f1r2-plus / --f1r2-minus` controls whether paired-end reads whose mate 1 aligns in the forward orientation and mate 2 in the reverse orientation are considered to originate from the plus or the minus strand (for the Illumina library prep kits that our lab uses, they come from the minus strand, so `--f1r2-minus` is the default). Reads where mates 1 and 2 align in the reverse and forward orientations, respectively, are considered to originate from the other strand.
- For single-end reads, the behavior is the same as for read 1: in `--f1r2-minus` mode, single-end reads that align in the forward orientation are considered to have come from the minus strand.
- The option `--minus-label` controls the label appended to the minus strand of each reference (by default, it is the name of the reference followed by `-minus`).
- In strand-aware mode, `seismic align` also writes a FASTA file of all reference sequences (including their minus strands) whose BAM files received a sufficient number of reads (controlled by `--min-reads`) into the same directory as the BAM files and align report, with the same name as the original FASTA file.
- Currently, strand-aware alignment is only available through `seismic align`, not `seismic wf`. This limitation arises because separating strands actually generates new reference sequences (namely, the minus strands); if those sequences are missing from the FASTA given to the `relate` step, then any BAM files aligned to the minus strands will not be able to be processed. It is straightforward to run `seismic align` in strand-aware mode, then use the FASTA file it generates as input for `seismic relate` or `seismic wf`. However, switching the FASTA file automatically within `seismic wf` will require non-trivial re-engineering of how the pipeline works (or some other hacks).

Bug Fixes

- The mechanism to release files to the output directory now keeps a backup of any existing output files until it is sure that the new files have been written. This setup avoids potentially deleting existing output files but then failing to write the new files, causing data loss.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.2...v0.19.0

0.18.2

Performance Enhancements

- The `samtools sort` by query name step has been replaced by `samtools collate` to increase speed (for details, see https://www.htslib.org/doc/samtools-collate.html).

Bug Fixes

- Name-sorting/collation now occurs at the beginning of `relate` instead of the end of `align` in order to fix a bug wherein passing an unsorted/uncollated SAM/BAM/CRAM file (i.e. created without or modified outside of `seismic align`) to `relate` would cause it to treat paired-end reads as single-end.
- The algorithm for finding ambiguous indels has been updated to keep indels from entering soft-clipped regions of the read when both the indel and the soft-clipped region lie within a contiguous stretch of low-quality bases (although this situation is very rare). Before this fix, a read with such a long stretch of low-quality bases including an indel and a soft-clip could have been erroneously assigned indels at positions outside of the region to which it had aligned to the reference, which would have resulted in incorrect (occasionally negative) counts of matches in the `table` step.
- The `mask` and `table` steps have been updated to raise errors upon finding negative counts, so that any other such bugs do not go unnoticed.
- In `demult`, reports are now named after samples, and an issue with obtaining barcodes from sequences with different barcode coordinates has also been fixed.

What's Changed
* 0.18.2 by matthewfallan in https://github.com/rouskinlab/seismic-rna/pull/16


**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.1...v0.18.2

0.18.1

New Features

- SEISMIC-RNA now natively supports parsing and writing RNA structures in dot-bracket (DB) format using functions in the module `seismicrna.core.rna.db`.
- Convert CT to DB files, and vice versa, using the new command-line utilities `seismic +ct2db` and `seismic +db2ct`.

Performance Enhancements

- During the `table` step, SEISMIC-RNA reduces memory usage when bias correction is disabled (`--min-mut-gap=0` by avoiding computing the full end coordinate matrix. This is especially important for very long (more than about 30 kb) sections. If the section is very long and bias correction is enabled, a warning is now issued about excessive memory usage.

Bug Fixes

- Because SEISMIC-RNA now uses its own RNA structure format converters, they are not limited by the length restriction in RNAstructure. This prevents crashes during format conversion for very long reference sequences in the `fold` step.
- Fixed a bug where if an EM clustering dataset had 0 reads or positions, its attributes (mutation rates, end coordinate distribution, cluster proportions, etc.) would be allocated with `np.empty` but never initialized. Consequently, the number of non-zero elements could vary from run to run, leading the BIC score (which depends on the number of non-zero end coordinates) to behave unpredictably, sometimes causing it to find more than one cluster even with zero reads or positions. The attributes are now initialized with `np.zeros` so that they always behave the same way when there are 0 reads or positions.
- In the API `run` function for `mask`, missing defaults for `mask_sections_file` and `mask_pos_file` have been added.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.0...v0.18.1

0.18.0

New Features

- Give each temporary directory a unique, 8-character label (e.g. `tmp-abc123XY`) to avoid errors caused by the temporary directory existing beforehand.
- In the `align`, `relate`, `mask`, and `cluster` steps, as well as `+sim relate` (i.e. the steps that generate a report with one or more other output files), first write all output files to a temporary "release" directory; then, once all files have been written, move that directory to its final destination in one atomic operation if possible. This mechanism prevents problems caused by one run partially overwriting the output files of a previous run (e.g. because the second run crashed in the middle) and leaving the output files in an inconsistent, unusable state.
- In `fold` and `+sim fold`, if the `DATAPATH` environment variable has not been set (or has been set improperly), then the correct value can now be guessed as long as the `rnastructure` package has been installed with Conda. Future releases may be able to guess the value with other methods of installation.
- Make it easier to use the `run` functions through the Python API by automatically filling in default values for all arguments where possible, including defaulting to `None` for optional files.

Removed Features

- Writing alignment maps in CRAM format is no longer supported because this feature was already brittle (moving the output directory to a different location would break the CRAM file) and became too complicated with the transition to the temporary directory system. However, the `relate` step still supports reading CRAM files. So, if needed, the BAM files can be compressed into CRAM files after alignment using `samtools view`.

Compatibility

- Constrain the NumPy version to <1.27 to ensure compatibility with Numba. (The recent release of NumPy 2.0 is compatible with Numba 0.60.0 at the binary level only, but `pip` still seems to be okay with installing it; some unit tests fail with NumPy 2.0, so clearly it's not yet sufficiently compatible with either Numba or SEISMIC-RNA.)

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.17.6...v0.18.0

0.17.6

Bug Fixes

- Searching a directory for paths now avoids returning duplicate paths, which improves efficiency and avoids bugs caused by processing the same input path more than once. This is especially important if using `seismic wf` with an input path that contains many files (e.g. `out`).
- Graphs that are made from one table (e.g. `profile`, `histread`, `roc`, `aucroll`) now use the same output directory as the table file, which fixes a bug where `roc` and `aucroll` would crash if the output directory of the table was non-default (not `out`) and neither this directory nor an explicit structure file were passed to the graph commands.
- Steps that generate report files now always write the reports as long as the report file does not exist when the step starts running, which fixes a bug where if an input file were processed more than once by multiple processes simultaneously, then the second process could wind up overwriting the batches but not the report file of the first process, causing inconsistent checksums and preventing the report from being usable.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.17.5...v0.17.6

0.17.5

New Features

- When simulating mutation rates of multiple clusters, v0.17.0 used an entirely different set of mutation rates for each cluster (even where the structures were similar), which led to clusters' mutation rates differing more than they would with real data. Since v0.17.1, the same set of mutation rates for paired and unpaired bases has been used for all clusters (so that mutation rates among clusters would differ only at positions where the paired/unpaired status differed), which led to clusters' mutation rates being more similar than they would be with real data (and thus making it harder for the algorithm to distinguish clusters). This version tries to be more realistic by using, for each position, a mutation rate that depends on the deepest base pair (if any) that encloses the position. For example, if position 90 were enclosed by pair (84, 91) in both structure 1 and structure 2, then it would have the same mutation rate in both clusters; while if position 174 were enclosed by pair (172, 178) in structure 1 but pair (170, 175) in structure 2, then the mutation rates would differ between the clusters. This method should make the simulated mutation rates dependent on their local structures and improve the realism of the simulations, as well as make it easier to distinguish true clusters.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.17.4...v0.17.5

Page 2 of 12

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.