Seismic-rna

Latest version: v0.24.2

Safety actively analyzes 723717 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 13

0.20.0

Not secure
New Features

- SEISMIC-RNA can now be installed with Conda (Bioconda channel): ``conda install -c bioconda -c conda-forge seismic-rna``
- In ``seismic wf``, clustering is now enabled using ``--cluster`` rather than setting the maximum number of clusters (``--max-clusters/-k``) to a positive integer.
- Throughout SEISMIC-RNA, the name for the number of clusters has been renamed from the more confusing "Order" of the clustering to "K", which is a general term for the number of clusters (e.g. as in K-means clustering).
- Clustering can now find an unlimited number of clusters, specified by setting ``--max-clusters/-k`` to 0 (the default). In this case, clustering will continue until either the BIC fails to decrease or the number of clusters does not pass filters (see below).
- If you specify a maximum number of clusters (with ``-k``), then you can now choose to force clustering using every number of clusters up to that maximum (with ``--try-all-ks``) or stop when the latest number of clusters is not better than the previous number (the only option in former versions).
- You can now set a minimum number of clusters with ``--min-clusters``, which makes clustering _start_ at that number of clusters (e.g. if you set it to 3, then SEISMIC-RNA will start with 3 clusters and never try 1 or 2).
- You can also choose whether to keep all numbers of clusters you tried (``--keep-all-ks``) or only the number of clusters that gave the best BIC among those that passed filters (see below). Note that with the latter option, if no clusters pass the filters (which can happen if you use ``--min-clusters`` greater than 1), then no clusters will be output, which will cause an error in the table step.
- Clustering now includes filters to make sure the clusters are valid, rather than simply sorting by BIC.
- One set of filters removes individual EM runs:
- ``--max-pearson-run``: upper limit on the Pearson correlation between any two clusters
- ``--min-nrmsd-run``: lower limit on the normalized RMSD between any two clusters
- Another set of filters removes each numbers of clusters (K) where the runs are not sufficiently consistent:
- ``--min-pearson-vs-best``: requires at least one suboptimal run to have at least this Pearson correlation vs. the best run for that K.
- ``--max-nrmsd-vs-best``: requires at least one suboptimal run to have at most this normalized RMSD vs. the best run for that K.
- ``--max-loglike-vs-best``: requires the best suboptimal run to have at most this difference in log likelihood vs. the best run for that K.
- Scatter plots now print the correlation on the plot; choose the correlation metric using ``--metric``.
- ROC plots now print the area under each curve.
- SEISMIC-RNA can now guess the ``DATAPATH`` environment variable when RNAstructure is installed either manually from the Mathews Lab website or with Conda.
- In the Python API, the Header class now accepts arbitrary numbers of clusters, rather than requiring an unbroken range between a minimum and a maximum number of clusters.
- Accordingly, table files with clusters can now contain arbitrary numbers of clusters, rather than needing to start with 1 cluster and count up to the maximum number of clusters.
- New unit tests have been added to verify that the new Header class functions properly, that batch counting and accumulation functions work with averages and clusters, and that the entire workflow runs on simulated data.
- The GitHub Actions workflow now enforces all unit tests to finish successfully; if not, the workflow checks are marked as failing. Previously, it would run the unit tests but do nothing with the test results.
- The GitHub Actions workflow now builds and deploys the documentation automatically each time the source code is updated, saving the need to build the documentation manually and push it to GitHub with every update (or even to keep the built documentation in the GitHub repo).

Removed Features

- ``seismic table`` no longer generates per-read tables from clustered datasets (i.e. ``clust-per-read.csv``). This is because these tables had been of little to no value and were easy to misinterpret: in fact, generating a histogram of number of mutations per read produced the wrong results, with no straightforward fix.
- ``+addclust`` and ``+delclust`` have been removed because they are less useful with the new cluster filter features, while maintaining both of these commands including the new features would be substantially more complicated.
- The built documentation (``seismic-rna/docs``) has been removed from the GitHub repository, to reduce its size and to remove the need to manually rebuild the docs each time the documentation source files are updated. Only the documentation source files (``seismic-rna/src/userdocs``) remain.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.19.2...v0.20.0

0.19.2

Not secure
Bug Fixes

- Fixed a bug where single-end reads would be discarded using `--sep-strands`.
- Fixed bugs causing the relate step to fail with a mixture of two-mate and one-mate paired-end reads (possible with Bowtie2 mixed mode, `--bt2-mixed`).
- Updated the read counter and the SAM parser to handle mixed mode properly; added unit tests.
- Implemented a more robust method of creating temporary directories.
- Changed the FASTA parser and writer so that they raise errors if any name/sequence is invalid, rather than skipping invalid sequences and returning the rest, to avoid a potential bug where if two references had the same name, it could be unclear which sequence would actually be used (which has become more of a concern with `--sep-strand` because minus-strand references can be created automatically).

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.19.1...v0.19.2

0.19.1

Not secure
New Features

- `seismic wf` now accepts `--sep-strands`. This change is possible because `seismic relate` now also accepts `--sep-strands` and, if that option is given, will auto-generate a FASTA file of both strands. Thus, you can also run `seismic align` and `seismic relate` separately, as long as you pass the same FASTA file and the same `--sep-stands` and `--minus-label` options to both commands.
- `seismic +splitbam` is a new command that accepts one or more BAM (or CRAM) files and splits each file into one BAM file for each reference, just like the last phase of `seismic align`. It also accepts the `--sep-strands` option if you want to split each reference further into plus and minus strands. This feature is useful when you generate or obtain a BAM file outside of SEISMIC-RNA that contains multiple references, and you need to split it into one file for each reference so that you can pass those files into `seismic relate` or `seismic wf`.

Removed Features

- `seismic align` with `--sep-strands` no longer generates a FASTA file of both strands because it is no longer necessary due to the first new feature (above).

Bug Fixes

- Fixed bug where an empty directory called `align-xxxxxxxx` would appear in a sample's output directory if the `align` output directory did not yet exist.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.19.0...v0.19.1

0.19.0

Not secure
New Features

Strand-aware alignment

- Each BAM file can be split into separate files of reads originating from the plus- versus the minus-strand of the RNA using `--sep-strands`.
- The option `--f1r2-plus / --f1r2-minus` controls whether paired-end reads whose mate 1 aligns in the forward orientation and mate 2 in the reverse orientation are considered to originate from the plus or the minus strand (for the Illumina library prep kits that our lab uses, they come from the minus strand, so `--f1r2-minus` is the default). Reads where mates 1 and 2 align in the reverse and forward orientations, respectively, are considered to originate from the other strand.
- For single-end reads, the behavior is the same as for read 1: in `--f1r2-minus` mode, single-end reads that align in the forward orientation are considered to have come from the minus strand.
- The option `--minus-label` controls the label appended to the minus strand of each reference (by default, it is the name of the reference followed by `-minus`).
- In strand-aware mode, `seismic align` also writes a FASTA file of all reference sequences (including their minus strands) whose BAM files received a sufficient number of reads (controlled by `--min-reads`) into the same directory as the BAM files and align report, with the same name as the original FASTA file.
- Currently, strand-aware alignment is only available through `seismic align`, not `seismic wf`. This limitation arises because separating strands actually generates new reference sequences (namely, the minus strands); if those sequences are missing from the FASTA given to the `relate` step, then any BAM files aligned to the minus strands will not be able to be processed. It is straightforward to run `seismic align` in strand-aware mode, then use the FASTA file it generates as input for `seismic relate` or `seismic wf`. However, switching the FASTA file automatically within `seismic wf` will require non-trivial re-engineering of how the pipeline works (or some other hacks).

Bug Fixes

- The mechanism to release files to the output directory now keeps a backup of any existing output files until it is sure that the new files have been written. This setup avoids potentially deleting existing output files but then failing to write the new files, causing data loss.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.2...v0.19.0

0.18.2

Not secure
Performance Enhancements

- The `samtools sort` by query name step has been replaced by `samtools collate` to increase speed (for details, see https://www.htslib.org/doc/samtools-collate.html).

Bug Fixes

- Name-sorting/collation now occurs at the beginning of `relate` instead of the end of `align` in order to fix a bug wherein passing an unsorted/uncollated SAM/BAM/CRAM file (i.e. created without or modified outside of `seismic align`) to `relate` would cause it to treat paired-end reads as single-end.
- The algorithm for finding ambiguous indels has been updated to keep indels from entering soft-clipped regions of the read when both the indel and the soft-clipped region lie within a contiguous stretch of low-quality bases (although this situation is very rare). Before this fix, a read with such a long stretch of low-quality bases including an indel and a soft-clip could have been erroneously assigned indels at positions outside of the region to which it had aligned to the reference, which would have resulted in incorrect (occasionally negative) counts of matches in the `table` step.
- The `mask` and `table` steps have been updated to raise errors upon finding negative counts, so that any other such bugs do not go unnoticed.
- In `demult`, reports are now named after samples, and an issue with obtaining barcodes from sequences with different barcode coordinates has also been fixed.

What's Changed
* 0.18.2 by matthewfallan in https://github.com/rouskinlab/seismic-rna/pull/16


**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.1...v0.18.2

0.18.1

Not secure
New Features

- SEISMIC-RNA now natively supports parsing and writing RNA structures in dot-bracket (DB) format using functions in the module `seismicrna.core.rna.db`.
- Convert CT to DB files, and vice versa, using the new command-line utilities `seismic +ct2db` and `seismic +db2ct`.

Performance Enhancements

- During the `table` step, SEISMIC-RNA reduces memory usage when bias correction is disabled (`--min-mut-gap=0` by avoiding computing the full end coordinate matrix. This is especially important for very long (more than about 30 kb) sections. If the section is very long and bias correction is enabled, a warning is now issued about excessive memory usage.

Bug Fixes

- Because SEISMIC-RNA now uses its own RNA structure format converters, they are not limited by the length restriction in RNAstructure. This prevents crashes during format conversion for very long reference sequences in the `fold` step.
- Fixed a bug where if an EM clustering dataset had 0 reads or positions, its attributes (mutation rates, end coordinate distribution, cluster proportions, etc.) would be allocated with `np.empty` but never initialized. Consequently, the number of non-zero elements could vary from run to run, leading the BIC score (which depends on the number of non-zero end coordinates) to behave unpredictably, sometimes causing it to find more than one cluster even with zero reads or positions. The attributes are now initialized with `np.zeros` so that they always behave the same way when there are 0 reads or positions.
- In the API `run` function for `mask`, missing defaults for `mask_sections_file` and `mask_pos_file` have been added.

**Full Changelog**: https://github.com/rouskinlab/seismic-rna/compare/v0.18.0...v0.18.1

Page 3 of 13

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.