- Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use `glycowork`
- Added optional installs for specialized `glycowork` usage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs
glycan_data
Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings
**loader**
- Added `multireplace` helper function, to map a dictionary of changes to a string
- Made `build_custom_df` faster
motif
**draw**
- Added `draw` as a new submodule of `.motif`
- Added `GlycoDraw` to draw glycans in SNFG style and save them as .svg/.pdf
- Added `annotate_figure` to replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.)
- Added `text_to_glycan`, which replaces glycan strings in figures with glycan images
- Added `scale_in_range` to normalize a list of numbers within a range
**tokenization**
- Sped up `glycan_to_composition` by 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions using `glycan_to_composition`
- Sped up `composition_to_mass` (independent of the above)
- `glycan_to_composition` (and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN
- Renamed `calculate_theoretical_mass` to `glycan_to_mass`
- Sped up `mz_to_composition2` by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom
- Reprioritized `mz_to_composition2` by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions
- `canonicalize_iupac` now also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich
- Moved `canonicalize_iupac` into `motif.processing`
- Expanded `get_core` (and downstream functions) with HexA, HexNAc, dHex
- Expanded `map_to_basic` to (some) post-biosynthetic modifications
- `mz_to_structures` no longer outright fails if no m/z value can be matched
- Deprecated `structures_to_motifs` ; `annotate_dataset` can do the same
**processing**
- Fixed bug in processing glycans with floating substituents in `small_motif_find`
- Deprecated `seed_wildcard`
- `choose_correct_isoform` has been updated to keep up with the improved `find_isomorphs`
- Added more informative error message to `IUPAC_to_SMILES`
- `get_lib` is now slightly faster
**graph**
- Sped up `compare_glycans` with string inputs, by avoiding graph operations when the two glycans do not have the same composition
- Added support for enabling modification wildcards in `compare_glycans` and `subgraph_isomorphism` (for instance matching GalOS and Gal6S) by setting wildcards_ptm = True
- Speed-up `glycan_to_nxGraph_int` by optimizing node label/attribute assignments
- Refactor `graph_to_string` to be a lot more robust, streamlined, and faster. Its new integration with `canonicalize_iupac` may also result in string improvement upon back-translation (e.g., branch order canonicalization)
- `ensure_graph` now has **kwargs that get passed to `glycan_to_nxGraph`
- `get_possible_topologies` now supports internal additions as well, with the keyword argument ‘exhaustive’
- `possible_topology_check` now supports wildcard matching via **kwargs passed on to `compare_glycans`
- Made changes to make `glycowork` compatible with NetworkX 3.0
- Moved `bracket_removal` to `motif.processing`
- Fixed a small inconsistency in handling floating substituents in `glycan_to_nxGraph_int` that could have caused issues with custom libs
- `override_reducing_end` is no longer needed in `glycan_to_nxGraph` to delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred within `glycan_to_nxGraph` now
**annotate**
- Deprecated `convert_to_counts_glycoletter` and `glycoletter_count_matrix` ; `motif_matrix` can do both
- Refactored `motif_matrix` to be substantially faster and more condensed in its output (also speeds up `annotate_dataset` with the ‘exhaustive’ option in the feature_set argument)
- Expanded `motif_matrix` to implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former)
- `annotate_glycan` is now dual-compatible with string and networkx graph input
- expanded feature_set in `annotate_dataset` by the option ‘terminal’, which calls `get_terminal_structures`
- This usage of `get_terminal_structures` in `annotate_dataset` now also does the same implicit test for subsumption enrichment as described for `motif_matrix` above
- `annotate_dataset` now creates its own lib, based on the motif list and the provided glycans
- Expanded `find_isomorphs` to also be able to re-shuffle (some) branched branches
- Moved `find_isomorphs` into `motif.processing`
- Linkages-only are no longer considered by `motif_matrix` / `annotate_dataset`
**analysis**
- All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
- Added `get_differential_expression` to compare glycomics data, including data cleaning and imputation
- `get_pvals_motifs` and `make_heatmap` no longer have the lib keyword argument, as `annotate_dataset` will generate a suitable lib internally
- Fixed relative abundance summation in motif-mode for `make_heatmap`
- Added the `clean_up_heatmap` helper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs
- Added `make_volcano`, to generate a volcano plot from internally calculated differential expression using the `get_differential_expression` function
- Moved `cohen_d` into `motif.processing`
ml
**model_training**
- `train_ml_model` no longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally
network
**biosynthesis**
- Refactored `construct_network` pipeline to be faster and more memory-efficient
- `reducing_end` has been deprecated and is being handled internally
- Added `infer_roots` to auto-infer `permitted_roots` (also does not need to be specified any longer in `construct_network`)
- Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
- Deprecated `subgraph_to_string` and `make_network_from_edges`
- Deprecated `fill_with_virtuals` and `make_network_directed`
- Minor speed-up of `process_ptm`, by pre-calculating stem_lib once instead of for every glycan in network