Gemmi

Latest version: v0.7.1

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.7.0

C++14 (or later) is required to build the library, C++17 (or later) to build Python bindings.
Expect breaking changes, especially in Python bindings.
The lists below are not complete, but should cover most of the changes.

Library

* Added unified logging of warnings/errors from various gemmi functions (class Logger)
* replaced string `Model::name` with int `Model::num`
* mmcif: better handling of null auth_comp_id
* fixes for mmJSON
* Removed deprecated functions:
- UnitCell.fractionalization_matrix and orthogonalization_matrix – use frac.mat and orth.mat
- count_hydrogen_sites() – use has_hydrogen() or count_atom_sites(gemmi.Selection('[H,D]')
- Grid::resample_to() – use interpolate_grid()
* unified API of Grid interpolation functions. They now have parameter `order` that can be 0 (nearest value), 1 (linear interpolation), or 3 (cubic). In C++ there are also functions such as trilinear_interpolation() to ensure no overhead.
* to_pdb: write HET records
* Extended selection syntax with: `[metals]` and `[nonmetals]`.
* Added function set_is_metal() intended for debatable metalloids
* improved interoperability with MMDB (a CCP4 library)
* MonLib: removed `read_cif` args
* mtz: fixed writing BATCH records
* hydrogen placement: fixes needed for new files with metals in CCP4 Monomer Library
* pdb: fixed reading TLS S tensor
* Structure metadata: expanded RefinementInfo

Python

* Python bindings **migrated from pybind11 to nanobind**.
- Much lower runtime overhead, faster build times, better error diagnostics.
- Built-in typing stubs.
- Only Python 3.8+.
- Sadly, no support for Buffer Protocol. It was replaced with NumPy `__array__` methods.
For NumPy, you can also use `.array` properties that were available also in the previous releases.
- No implicit conversions from list to ndarray, and from bytes to string (let me know where it causes problems)
- gemmi.ValueSigmaAsuData.value_array has now shape (N,2)
* Added pickling support for Structure, Model, Chain, Residue, Atom, cif.Document, cif.Block.
* Added function interpolate_position_array (323).
* Python extension module is now installed into `site-packages/gemmi/` (this change should be invisible to the user)

Program
* gemmi convert --sifts-num is now more customizable
* gemmi sf2map: added option --check (see docs)
* gemmi cif2mtz: add a rule to spec to convert `pdbx_F_calc_with_solvent` to `F-model` (+phase)
* gemmi xds2mtz: handles merged files from XSCALE
* gemmi mtz2cif and merge: recognize extension .ahkl as XDS file

0.6.7

This is primarily a bug-fix release. New Python bindings are not included yet.

Enhancements:

* New subcommand `gemmi set` for changing coordinates, B-factors and occupancies in coordinate files (mmCIF and PDB). Unlike other tools, it replaces numbers while leaving the rest of the file intact. An alternative to CCP4 PDBSET keywords: BFACTOR, OCCUPANCY, SHIFT, NOISE. Note that `gemmi convert` offers overlapping capabilities. For instance, `gemmi convert --apply-symop=x+0.123,y,z` shifts the coordinates similarly to `gemmi set --shift='9.3 0 0'` (the latter takes the shift in Angstroms).

* Improved anisotropic scaling of structure factors. More work is planned in this area.

Fixes:

* fixed reading of mmCIF files without `_atom_site.auth_seq_id`
* in Topology preparation: fixed a couple of bugs, peptide links are now assumed to be CIS for ω=0±60° (previously, ω=0±30°)
* fixed re-assignment of ATOM/HETATM record types (`gemmi convert --assign-records`)
* fixed `gemmi convert --sifts-num` for UniProt sequence numbers >5000

And various minor changes that are hard to describe concisely.

0.6.6

Library:
* SmallStructure: changed how the space group is [read and accessed](https://gemmi.readthedocs.io/en/latest/mol.html#smallstructure-spacegroup).
Relying on H-M space group names alone was not always sufficient. The new mechanism uses the list of operations and Hall symbol in preference to the H-M symbol – the order is configurable.
* symmetry triplets: parse decimal fractions (small molecule files may use notation such as x+0.25 instead of x+1/4)
* tabulated space groups: a few more settings: B 1 2 1, B 1 21 1, F 1 m 1, F 1 d 1, F 1 2 1
* X-ray scattering coefficients: changed the default value of `IT92::ignore_charge` to true (i.e. charges are now ignored by default; before version 0.6.3 they were always ignored)
* cif::Table: added method `ensure_loop()` that converts tag-value pairs into a loop; might be needed before calling `append_row()`
* place_hydrogens(): fix for NH3-like configurations
* improved gemmi->mmdb conversion
* Grid: tweaked good_grid_size() to ensure that when creating a grid up to a certain d_min, all reflections up to d_min are in the grid (it matters when no oversampling is applied)
* DensityCalculator: deprecated function `set_grid_cell_and_spacegroup()`, use `grid.setup_from()`
* fixed TNT-compatible reciprocal space ASU calculation for non-standard settings
* infer_polymer_end(): complicate the heuristic even more, to detect files that have HETATM incorrectly used for standard residues in a polymer (such files were reported, they are either a result of mutating from non-standard residues, or a buggy program)
* added function assign_het_flags() to re-set ATOM/HETATM flags
* Model: added funtions `calculate_b_iso_range()` and `calculate_b_aniso_range()`; the first one can be used to detect if pLDDT is in the range 0-100 (like from AlphaFold) or 0-1 (like from ESMFold)
* writing mmCIF: write _entity_poly_seq.hetero
* added flag `Entity::reflects_microhetero` that shows if sequences were read from SEQRES (and don't account for point mutations) or from _entity_poly_seq; new function `add_microhetero_to_sequences()` changes the former to the latter

Program:
* gemmi sfcalc: added a few more options
* gemmi convert: added options `--assign-records[=A|H]`, improved `--sifts-num`, adding microheterogeneities to _entity_poly_seq when converting from PDB
* gemmi cifdiff: added option `-t` for basic comparison of values for a single tag

Other:
* minimal WebAssembly port (C++ code compiled with emscripten) of Structure,
as a proof-of-concept and for reading mmCIF files in UglyMol
* examples/to_rdkit.py: example of conversion of gemmi ChemComp to RDKit Mol

and a number of less important changes

0.6.5

Library:
* gemmi can now be built with zlib-ng, a faster fork of zlib (good for working with large, compressed files)
* experimental: binary serialization of Structure (contained objects, such as Model, Chain or UnitCell, can also be serialized separately)
* finalized handling of 5-character monomer names; uses the tilde-hetnam extension (`ABCDE` ↔ `~DE`) for PDB files
* when atom names in the coordinate file match previous names (`_chem_comp_atom.alt_atom_id`) from the monomer library (the names in the CCD and therefore also in the ML change occasionally), print better diagnostic; added function `MonLib::update_old_atom_names()` to update the names in a Structure
* topology: fixed handling of two bonds between the same two residues
* options for handling mmCIF files with incorrect entities (modified `add_entity_ids()` when called with `overwrite=true`)
* added function `Intensities::prepare_merged_mtz()`
* a few bug fixes (for instance, in handling of negative residue numbers in the selection syntax)

Python bindings:
* generating type stubs - see 293
* python: `cif.Loop.val()` has been replaced with `__getitem__`/`__setitem__`
* fixed `Mtz.Batch.ints` and `Mtz.Batch.floats`

Program
* subcommand diff has been renamed to cifdiff
* subcommand prep has been renamed to crd
* validate: more options for checking monomer files
* gemmi-grep: added option --extended-regexp
* mtz2cif: added column names Iplus/Iminus (used by ccp4i2) to the default conversion spec

Note: this list is meant to show important changes only.

0.6.4

Library
* completely changed build system for Python module, from setuptools to scikit-build-core
* optimized electron density calculation: single-precision version is now about 2x faster and slightly less exact; some other grid-based calculations also got optimized in the process
* as part of the above optimizations, some of the grid computations require that the model is in the standard orientation (conventional axis directions); in other cases (which are very rare after the [remediation of non-standard coordinate frames](https://www.wwpdb.org/news/news?year=2023#6525a09ad78e004e766a96af) in the PDB) call standardize_crystal_frame()
* CIF output: more flexible formatting
* mmCIF writing: category _entity_poly is included by default, with pdbx_strand_id and pdbx_seq_one_letter_code
* minor changes in reading mmCIF coordinate files
* cif: added functions Loop::add_columns(), Loop::remove_column(), Column::erase()
* MRC map format: ORIGIN record is ignored (previously, if ORIGIN was non-zero, Ccp4::full_cell() returned false and some map properties were not set)
* new function Grid::symmetrize_avg()
* fixed bug in ReciprocalGrid::prepare_asu_data()
* added function read_pir_or_fasta() for reading sequences (previously it was undocumented and more limited)
* added function pdbx_one_letter_code() which returns a string like AA(MSE)H…, for _entity_poly.pdbx_seq_one_letter_code
* new functions expand_one_letter() and expand_one_letter_sequence() that take ResidueKind.AA/RNA/DNA as argument replaced expand_protein_one_letter*()
* adjusted weights in align_sequence_to_polymer()
* added function assign_best_sequences()
* PDB reading: added Structure::ter_status flag to indicate if TER records were: absent, present, clearly in wrong places
* experimental (not documented yet) new functions: Model::get_cra(), Model::get_parent_of()
* Topo::Bond stores a flag for bonds between different symmetry images
* ChemComp::Atom: store _chem_comp_atom.alt_atom_id as old_id, use it in new function update_old_atom_names()
* riding hydrogens: added H had wrong occupancy in special, rare cases
* added Vec3f – Vec3 with single-precision numbers
* minor API changes: Binner::setup() doesn't return anything, changed argument types of Scaling::scale_data(), align_sequences()

Program
* new tool gemmi-diff that compares categories and tags in two (mm)CIF files
* gemmi-align prints vertical list with option --verbose
* gemmi-residues has new options: -e, -sss, --chains
* gemmi-rmsz: added option --missing to print missing atoms
* gemmi-validate: more options for validating monomer files
* gemmi-h: more options
* gemmi-mtz: prints info about SYMM records

0.6.3

* new: normalization of amplitudes using so-called "Karle" approach, similar as in the CCP4 program ECALC
* added X-ray scattering coefficients for ions (previously, the charge of atom was ignored)
* pdb: reading CONECT records, and an option to also write them
* when reading pdb, if any chain has 2+ TER records, all TER records are ignored
* more configuration options for writing pdb files
* added functions Mtz::expand_to_p1() and Mtz::read_file_gz()
* cif::Block::find_value(tag) now returns also value from the corresponding loop if that loop has only one row
* changes in gemmi-validate related to validation with DDL2
* gemmi-sfcalc: added option --sigma-cutoff
* gemmi sf2map --mapmask: if the unit cells in coordinate file is different than in SF file, use only the latter
* improved transform_to_assembly(), expand_ncs() and rename_chain()
* cif2mtz: Mtz column for pdbx_DELPHWT has now label PHDELWT (272)
* fixed ensure_asu(): phase-shift (for phases and H-L coefficients) was wrong
* fixed UnitCell::find_nearest_image() for non-crystals with NCS
* fixed DensityCalculator::requested_grid_spacing()
* changes and enhancements in add_chemcomp_to_block(), in solvent masking, in mtz2cif,
and in several other places
* added python bindings to MtzToCif, cif::Ddl, PdbWriteOptions, changed how options for PDB writing are passed, more bindings for Mtz::Batch

Page 1 of 4

Releases

Has known vulnerabilities

Gemmi

Page 1 of 4

0.7.0

0.6.7

0.6.6

0.6.5

0.6.4

0.6.3

Page 1 of 4

Links

Releases