Cryodrgn

Latest version: v3.4.2

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

3.4.2

In this **patch release** we have drastically improved the runtimes of several existing features, as well as addressed some known issues and bugs:

Improving Runtimes
- extended the use of mixed precision training (as implemented in [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html)), already the default for `train_nn` and `train_vae`, to the ab-initio reconstruction commands `abinit_homo` and `abinit_het`, resulting in observed speedups of 2-4x
- vectorized rotation matrix computation in `parse_pose_star` for a ~100x speedup of this step and a 2x speedup of the command as a whole (143)
- returned volume evaluation in `analyze_landscape_full` to the GPU resulting in 10x speedup (405)

Fixing Known Issues
- incorrect batch processing causing out-of-memory issues when using chunked output in `downsample` (412)
- error when using `--flip` in `analyze_landscape_full` (409)
- `parse_mrc` bug in landscape analysis notebook (413)


Please let us know if you have any feedback or comments!

3.4.1

This is a **patch release** to address some minor issues and improve compatibility of cryoDRGN with the default output number format used by the most recent versions of RELION:

- adding support for `np.float16` format input .mrcs files, which are now cast to `np.float32` as necessary for Fourier transform operations (404)
- `models.PositionalDecoder.eval_volume()` now keeps volumes in GPU
- better progress log messages in `backproject_voxel`; improved control over logging using `--log-interval` to match other reconstruction commands:

(INFO) (lattice.py) (03-Oct-24 10:52:54) Using circular lattice with radius=150
(INFO) (backproject_voxel.py) (03-Oct-24 10:52:55) fimage 0 — 0.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:54:02) fimage 200 — 4.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:55:10) fimage 400 — 8.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:56:18) fimage 600 — 12.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:57:26) fimage 800 — 16.0% done

- `filter_cs` to replace `write_cs`, which is now considered deprecated with a suitable warning message, and fixing issues with filtering .cs files produced by the most recent cryoSPARC versions (150)
- using 0.5 * 0.143-threshold of the “No Mask” FSC curve to start applying phase-randomization correction to the “Tight Mask” FSC curve instead of 0.75 * 0.143-threshold of the “Tight Mask” FSC curve when the tight mask curve never crosses the 0.143 threshold (previously defaulted to the Nyquist limit):

3.4.0

In this minor release we are adding several new features and commands, as well as expanding a few existing ones and introducing some key refactorings to the codebase to make these changes easier to implement.

New features
- full support for RELION 3.1 `.star` files with optics values stored in a separate grouped table before or after the main table (241, 40, 10)
- refactored `Starfile` class now has properties `.apix` and `.resolution` that return particle-wise optics values for commonly used parameters, as well as methods `.get_optics_values()` and `.set_optics_values()` for any parameter
- these methods automatically use the optics table if available
- `cryodrgn parse_ctf_star` can now load all particle-wise optics values from the .star file itself instead of the current behavior of relying upon user input for parameters such as A/px, resolution, voltage, spherical aberration, etc., or just taking the first value found in the file
- `backproject_voxel` now computes FSC threshold values corrected for mask overfitting using [high resolution phase randomization](https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/tutorial-common-cryosparc-plots#fsc-fourier-shell-correlation-plots) as done in cryoSPARC, as well as showing FSC curves and threshold values for various types of masks:
![tight-mask](https://github.com/user-attachments/assets/e7a87891-7206-4f5d-861a-2412f0bac1b9)

- `cryodrgn_utils plot_classes` for creating plots of cryoDRGN results colored by a given set of particle class labels
- for now, only creates 2D kernel density plots of the latent space embeddings clustered using UMAP and PCA, but more plots will be added in the future:

bash
$ cryodrgn_utils plot_classes 002_train-vae_dim.256 9 --labels published_labels_major.pkl --palette viridis --svg


`analyze.9/umap_kde_classes.png`

![umap_kde_classes](https://github.com/user-attachments/assets/5fb5b054-5a74-48c5-aa2a-0a177c2bc8cc)

Improvements to existing features
- `backproject_voxel` also now creates a new directory using `-o/--outdir` into which it places output files, instead of naming all files after the output reconstructed volume `-o/--outfile`
- files within this directory will always have the same names across runs:
- `backproject.mrc` the full reconstructed volume
- `half_map_a.mrc`, `half_map_b.mrc` reconstructed half-maps using an odd/even particle split
- `fsc-vals.txt` all five FSC curves in space-delimited format
- `fsc-plot.png` a plot of these five FSC curves as shown above
- `downsample` can now downsample each of the individual files in a stack referenced by a .star or .txt file, returning a new .star file or .txt file referencing the new downsampled stack
- used by specifying a .star or .txt file as `-o/--outfile` when using a .star or .txt file as input:
python
cryodrgn downsample my_particle_stack.star -D 128 -o particles.128.star --datadir folder_with_subtilts/ --outdir my_new_datadir/


- `cryodrgn_utils fsc` can now take three volumes as input, in which case the first volume will be used to generate masks to produce cryoSPARC-style FSC curve plots including phase randomization for the “tight” mask (see **New features** above)
- `cryodrgn_utils plot_fsc` is now more flexible with the types of input files it can accept for plotting, including `.txt` files with the new type of cryoSPARC-style FSC curve output from `backproject_voxel`
- `cryodrgn filter --force` for less interactivity after the selection has been made
- `filter_mrcs` prints both original and new number of particles; generates output file name automatically if not given
- `cryodrgn abinit_het` saves `configs` alongside model weights in `weights.pkl` for easier access and output checkpoint identification

Addressing bugs and other issues
- better axis labels for FSC plotting, passing Apix values from `backproject_voxel` (385)
- `cryodrgn filter` doesn’t show particle indices in hover text anymore, as this proved visually distracting; we now show these indices in a text box in the corner of the plot
- `cryodrgn filter` saves chosen indices as a `np.array` instead of Python standard `list` to prevent type issues in downstream analyses
- `commands_utils.translate_mrcs` was not working (was assuming `particles.images()` returned a numpy array instead of a torch Tensor) — this has been fixed and tests added for translations of image stacks
- going back to listing modules to be included in the `cryodrgn` and `cryodrgn_utils` command line interfaces explicitly, as Python will sometimes install older modules into the corresponding folders which confuses automated scanning for command modules
- fixing parsing of 8bit and 16bit .mrc files produced using e.g. `--outmode=int8` in EMAN2 (113)
- adding support and continuous integration testing for Python 3.11

Refactoring classes that parse input files

There were some updates we wanted to make to the `ImageSource` class and its children which was introduced in a refactoring of the processes used to load and parse input datasets in v3.0.0. We also sought to simplify and clean up the code in the methods used to parse .star file and .mrcs file data in `cryodrgn.starfile` and `cryodrgn.mrc` respectively.

- the code for the `ImageSource` base class and its children classes in `cryodrgn.source` have been cleaned up to improve code style, remove redundancies, and support the `Starfile` and `mrcfile` refactorings described below
- more consistent and sensible parsing of filenames with `datadir` for `_MRCDataFrameSource` classes such as `TxtFileSource` and `StarfileSource` (386)
- all of this logic is now contained in a new method `_MRCDataFrameSource.parse_filename` which is applied in `__init__`:
1. If the `filename` by itself points to a file that exists, use `filename`.
2. Otherwise, if `os.path.join(datadir, newname)` exists, use that.
3. Finally, try `os.path.join(datadir, os.path.basename(newname))`.
4. If that doesn’t exist, throw an error!
- adding `ImageSource.orig_n` attribute which is often useful for accessing the original number of particles in the stack before filtering was applied
- adding `ImageSource.write_mrc()`, to avoid having to use `MRCFile.write()` for `ImageSource` objects; `MRCFile.write()` use case for arrays has been replaced by `mrcfile.write_mrc` (see below)
- see use in a refactored `cryodrgn downsample` for batch writing to `.mrc` output
- adding `MRCFileSource.write()`, a wrapper for `mrcfile.write_mrc()`
- adding `MRCFileSource.apix` property for convenient access to header metadata
- getting rid of `ArraySource`, whose behavior can be subsumed into `ImageSource` with `lazy=False`
- improving error messages in `ImageSource.from_file()`, `._convert_to_ndarray()`, `images()`
- `ImageSource.lazy` is now a property, not an attribute, and is dynamically dependent on whether `self.data` has actually been loaded or not
- adding `_MRCDataFrameSource.sources` convenience iterator property
- `StarfileSource` now inherits directly from the `Starfile` class (as well as `_MRCDataFrameSource`) for better access to .star utilities than using a `Starfile` object as an attribute (`.df` in the old v3.3.3 class)
- .star file methods have been refactored to establish three clear ways of accessing and manipulating .star data for different levels of features, with RELION3.1 operations now implemented in `Starfile` class methods:
- `cryodrgn.starfile.parse_star` and `write_star` to get and perform simple operations on the main data table and/or the optics table
e.g. in `filter_star`:

python
stardf, data_optics = parse_star(args.input)
...
write_star(args.o, data=filtered_df, data_optics=new_optics)


- `cryodrgn.starfile.Starfile` for access to .star file utilities like generating optics values for each particle in the main data table using parameters saved in the optics table
e.g. in `parse_ctf_star`:

python
stardata = Starfile(args.star)
logger.info(f"{len(stardata)} particles")
apix = stardata.apix
resolution = stardata.resolution
...
ctf_params[:, i + 2] = (
stardata.get_optics_values(header)
if header not in overrides
else overrides[header]
)


- `cryodrgn.source.StarfileSource` for access to .star file utilities along with access to the images themselves using `ImageSource` methods like `.images()`
- see our more detailed write-up for more information:
[Starfile Refactor](https://www.notion.so/Starfile-Refactor-32192f647afe4d1e9bb7dd1b5c0a7565?pvs=21)

- for .mrc files, we removed `MRCFile` as there are no analogues presently for the kinds of methods supported by `Starfile`; the operations on the image array requiring data from the image header are presently contained within `MRCFileSource`, reflecting the fact that .mrcs files are the image data themselves and not pointers to other files containing the data
- `MRCFile`, which consisted solely of static `parse` and `write` methods, has been replaced by the old names of these methods (`parse_mrc` and `write_mrc`)
- `MRCFile.write(out_mrc, vol)` → `write_mrc(out_mrc, vol)`
- in the case of when `vol` is an `ImageSource` object, we now do `ImageSource.write_mrc()`
- in general, `parse_mrc` and `write_mrc` are for using the entire image stack as an array, while `MRCFileSource` is for accessing batches of images as tensors
- `mrc` module is now named `mrcfile` for better verbosity and to match `starfile` module which is its parallel for processing input files
- examples from across the codebase:
- `commands_utils.add_psize`

old:

python
from cryodrgn.mrc import MRCFile, MRCHeader
from cryodrgn.source import ImageSource

header = MRCHeader.parse(args.input)
header.update_apix(args.Apix)

src = ImageSource.from_file(args.input)
MRCFile.write(args.o, src, header=header)


new:

python
from cryodrgn.mrcfile import parse_mrc, write_mrc

vol, header = parse_mrc(args.input)
header.apix = args.Apix
write_mrc(args.o, vol, header=header)


- `commands_utils.flip_hand`
old:

python
src = ImageSource.from_file(args.input)
Note: Proper flipping (compatible with legacy implementation) only happens when chunksize is equal to src.n
MRCFile.write(
outmrc,
src,
transform_fn=lambda data, indices: np.array(data.cpu())[::-1],
chunksize=src.n,
)


*Note that the awkward combination of `MRCFileSource` and `MRCFile` above meant having to cast the images from tensors to arrays after they were loaded!*

new:

python
vol, header = parse_mrc(args.input)
vol = vol[::-1]
write_mrc(outmrc, vol, header=header)


- also made some updates to `MRCHeader` for ease of use:
- making `mrc` module variables like `DTYPE_FOR_MODE` header class attributes
- creating properties `apix` and `origin` with `.getter` and `.setter` methods, simplifying retrieval of these values
- e.g. `header.origin = (0, -1, 0)` instead of `header.update_origin(0, -1, 0)` , with `header.origin` instead of `header.get_origin()` to get values

Code Quality Control

- improving module-level docstrings with more info and usage examples
- better parsing of multi-line example usage commands split up by `\` in `cryodrgn.command_line` when producing help messages for `-h`
- see e.g. `cryodrgn.dataset`, `cryodrgn.starfile`, `cryodrgn.source`, `cryodrgn filter`, `cryodrgn filter_mrcs`
- in automated CI testing, we now test `3.9 + 1.12`, `3.10 + 2.1`, and `3.11 + 2.4` in terms of Python version + PyTorch version, instead of doing all pairs of `{3.9, 3.10}` and `{1.12, 2.1, 2.3}`, allowing for CI testing to be expanded into Python 3.11 without running too many test jobs
- better error messages for `cryodrgn.pose` and `cryodrgn.ctf` when inputs don’t match in dimension or have an unexpected format
- creating new module `cryodrgn.masking`, moving e.g. `utils.window_mask()` to `masking.spherical_window_mask()`
- bringing back [`unittest.sh`](http://unittest.sh), a set of smoke tests for reconstruction commands that can be run outside of `pytest` and regular automated CI testing, by replacing outdated commands (#267)
- first release with regression pipeline testing, confirming that outputs of key reconstruction commands has remained unchanged: see summary [here](https://fluff-hero-b57.notion.site/cryoDRGN-v3-4-0-Replication-Testing-0fe31153834580d39387d9d74d3c4180?pvs=4)

3.3.3

This **patch release** fixes several outstanding issues:

- the `--ntilts` argument to `backproject_voxel` did not do anything, and all tilts were always used; this flag now behaves as expected 379
- `cryodrgn_utils filter_star` now includes the (filtered) input optics table in the output if present in the input 370
- `cryodrgn filter` now accepts experiment outputs using tilt series particles 335
- fixing a numerical rounding bug showing up in transformations to poses used by `backproject_voxel` 380

We have also done more work to consolidate and expand our CI testing suite, with all of the `pytest` tests under `tests/` now using new data loading fixtures that allow for tests to be run in parallel using `pytest-xdist`. Datasets used in testing have also been moved from `testing/data/` to `tests/data/` to reflect that the old tests run using command-line under the former are now deprecated and are being replaced and rewritten as `pytest` tests in the latter folder.

Finally, we removed some remaining vestiges of the old way of handling large datasets difficult to fit into memory via `cryodrgn preprocess` (348) as well as improving the docstrings for several modules.

3.3.2

This **patch release** makes some improvements to tools used in writing and parsing .star files, as well as addressing a few bugs that have recently come to our attention:

- the filtering notebook `cryoDRGN_filtering` was very slow to run when applied to experiments using `--ind`; we tracked this down to using an incorrect approach to loading the dataset (374)
- nicer FSC plots in `backproject_voxel` using code refactored to apply the methods used in `fsc` and `plot_fsc`
- fixing an issue when the total particle count was module one the batch size, causing dimensionality errors with the final singleton batch due to how some `torch` and `numpy` operations handle singleton dimensions (351)
- creating a stopgap for 346 while we figure out what upstream problems could be causing these issues with `analyze`
- adding `linspace=True` to the `np.linspace` operation in `pc_traversal` for completeness
- properly supporting `.txt` files for `write_star`, with the correct file names now being written to the output, as well as `--ind` working correctly
- adding support for RELION 3.1 input files with multiple optics groups in `parse_pose_star`

We have also consolidated and improved upon several aspects of our continuous integration testing setup, including new tests covering the cases described above, refactoring the data fixtures used in existing tests, and testing across multiple `torch` versions after finding issues specific to `v1.8` in `analyze_landscape_full`.

3.3.1

This is **a patch release to address several bugs and issues** that have come to our attention:

- adding `--micrograph-files` argument to `filter_star` to create separate output files for each `_rlnMicroGraphName` encountered in the file
- `--ind` with `--encode-mode=tilt` wasn’t working in the case where all particles had the same number of tilts due to `dtype=object` patch introduced earlier
- fixed by storing particle→tilt index produced by `TiltSeriesData.parse_particle_tilt()` as a list instead of an array; this is more robust in general and all downstream cases are agnostic (see tests below)
- `backproject_voxel` was producing errors when trying to calculate threshold FSC values due to deprecated code used to parse FSC matrix (371)
- fixed by copying over code already used in `commands/fsc`
- `train_nn` and `train_vae` would error out if inputs were not divisible by 8 when using AMP optimization (e.g. 353)
- a warning here suffices as AMP optimization is the default and this is frustrating for many users
- better error message when CTF file is missing from `write_star` inputs
- better error message when `backproject_voxel` output is not `.mrc`
- bug in `ET_viz` notebook when `--ind` not specified caused by inconsistent definition of `ind0`
- bug in filtering notebook caused by using `ind=ind_orig` when loading dataset and then trying to filter again (363)
- `ZeroDivisionError` bugs in all notebooks when using small training datasets
- updating template analysis notebooks to use the given `kmeans` value in the copied-over notebook, similarly to out auto-updating of notebook epoch numbers

In addition to making the required fixes, we have **expanded and improved our deployment tests** to cover these cases and close some gaps in our testing coverage:

- adding a stand-alone test of backprojection under `test_reconstruct` applying both `.mrcs` and `.star` inputs
- more testing of `train_nn` cases with different `--amp`, `--batch-size`, `--poses` values
- fixing `check=True` issue in `utils.run_command()` that was allowing tests of backprojection to fail silently
- new deployment task schedule
- the `main` deployment task has been split into `tests` and `style` for tests of code integrity and code linting respectively
- run `tests` and `style` along with `beta-release` any time a patch version tag `[0-9]+\.[0-9]+\.[0-9]+-*` is pushed to any branch to trigger a verified upload to TestPyPI
- also run `tests` and `style` for any push to `develop` branch to allow for testing before beta release
- update `release` to only run when a stable version tag (`^[0-9]+\.[0-9]+\.[0-9]+$`) is pushed to `main`
- `tests` and `style` run on any push to `main` to allow for testing prior to release

**Other changes include:**

- applying `tmpdir_factory` to improve the `train_dir` and `AbinitioDir` fixtures used in tests with more robust setup and teardowns
- CodeFactor badge and nicer TestPyPI installation command in `README`
- dynamic update of plotted point sizes in `cryoDRGN_filtering.ipynb` interactive filtering widget, useful for smaller datasets for which the default is too small for points to be seen
- using `plt.close()` after `analyze` plotting for better memory management

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.