Glycowork

Latest version: v1.2.0

Safety actively analyzes 623541 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.2.0

- Added `glycoworkGUI.py` to build the .exe based GUI for important glycowork endpoint functions: `GlycoDraw`, `plot_glycans_excel`, and `get_differential_expression`
- Removed `python-louvain` as a required dependency for `glycowork`
glycan_data
loader
- Switched from `pkg_resources` to `importlib` for loading tabular data into the package
stats
- Fixed an issue in `TST_grouped_benjamini_hochberg` that caused errors if nothing was significantly different in the entire dataset or in any group
- `test_inter_vs_intra_grouping` is now robust to non-paired data and data with differing sample sizes per condition
- Added `replace_outliers_with_IQR_bounds` to support outlier treatment in `motif.analysis`
- Added `sequence_richness`, `shannon_diversity_index`, and `simpson_diversity_index` to calculate diversity indices of glycomics data
motif
processing
- WURCS handling for universal input now encompass more monosaccharides
- GlycoCT handling for universal input now is robust to the declaration of substituents not immediately following their monosaccharide in the GlycoCT string
- Added `equal_repeats` to check whether two repeating units of a polysaccharide are the same, just shifted
- Modified glycan nomenclature detection in `canonicalize_iupac` to be less prone of overidentifying Oxford when it’s just numbers etc.
- Added “ß” to the typo detection in `canonicalize_iupac` and “(-)” as a variation of linkage uncertainty detection
- Made `canonicalize_iupac` robust to the variation of using {} instead of () for linkages
graph
- Removed the required usage of lib in `glycan_to_nxGraph`, `compare_glycans`, `subgraph_isomorphism`, and all downstream functions (lib only remains for stemification and deep learning model training/inference)
- The keyword argument “wildcards_ptm” now also works as intended when providing pre-calculated graphs as input to `compare_glycans` or `subgraph_isomorphism`
- Fixed a rare issue in which `subgraph_isomorphism`, when “count = False”, would sometimes erroneously output “False” because of a greedy approach to evaluating potential matches
tokenization
- Added `get_unique_topologies` to retrieve all base topologies for a given composition that have been observed for a given taxonomic subset
- Added the “obfuscate_ptm” keyword argument to `map_to_basic`, to allow for mapping Gal6S to Hex6S rather than the default HexOS, if that is required/advantageous
- Support mapping of phosphorylated glycans in `map_to_basic`
draw
- Fixed an issue where cross-ring fragments were not correctly rendered in `GlycoDraw`
- `plot_glycans_excel` can now also be used with filepaths to .xlsx files (in addition to .csv files)
- `plot_glycans_excel` now also supports compact glycan drawing with the “compact” keyword argument
- Improved drawing resolution in `plot_glycans_excel`
- `GlycoDraw` will now more strongly make use of nomenclature canonicalization in case of IUPAC dialects (still not 100%, if you suspect you use a dialect of IUPAC, pass your sequences through `canonicalize_iupac` first)
- If no filepath is specified, `GlycoDraw` will now also display drawn glycan structures in a non-Jupyter environment (as the classic matplotlib pop-up). Note that this functionality requires the cairosvg dependency (head to https://bojarlab.github.io/glycowork/examples.html#glycodraw-code-snippets if you’re unsure about that)
analysis
- Functions able to use .csv paths as input can now also deal with .xlsx paths as input
- The new “annotate_volcano” keyword argument now allows for the direct insertion of SNFG images within plots from `get_volcano` without having to subsequently run `draw.annotate_figure`
- `get_pvals_motifs`, `get_differential_expression`, `get_glycanova`, `get_time_series`, and `get_jtk` now use `glycan_data.stats.replace_outliers_with_IQR_bounds` to auto-smooth outliers
- Moved `hotellings_t2` to `glycan_data.stats`
- All functions compatible with motif-level analysis now accept the “custom_motifs” keyword argument to be passed to `annotate_dataset` or `quantify_motifs` if “custom” is included in “feature_set”
- Changed the “mode” keyword argument in `get_heatmap` to “motifs” as a Boolean argument, like in all other `motif.analysis` functions
- Added a call to `clean_up_heatmap` to `get_jtk` to avoid redundant motifs
- Added `get_biodiversity` to compare two groups of glycomics datasets with regard to the sequence diversity that is present (similar to comparable analyses for microbiome data)
regex
- Added `filter_dealbreakers` to allow for the exclusion of identified matches if they have illegal components beyond the identified match (e.g., the forbidden Fuc in "Fuc-([Gal|GalNAc])?-Gal-([!Fuc]){,1}-GlcNAc"). Before this, the sequence context *except* the Fuc was extracted and returned.
- Fixed an edge case in `filter_matches_by_location` in which internal locations sometimes had to handle triple-nested lists which led to errors
- `get_match` can now also use glycan graphs, such as derived from `glycan_to_nxGraph`, as input
- Added `get_match_batch` to process a whole list of glycans at once, with some performance improvements via first pre-compiling the pattern
- Fixed an edge case in `get_match` in which pattern components consisting of a single monosaccharide with a specified linkage (e.g., “Fuca3”) could sometimes erroneously output no matches
- Added `motif_to_regex` to convert glycan motifs (e.g., in IUPAC-condensed) into a regular expression suitable for `get_match`. Limited to simple queries for now.
annotate
- `get_terminal_structures` now has a “size” keyword argument with which users can control the size of the extracted terminal motifs
- `get_k_saccharides` now has a “terminal” keyword argument with which users can filter to only count motifs at non-reducing ends
- `annotate_dataset` and functions using it now can add the “terminal2” and “terminal3” option in “feature_set” to also annotate & analyze terminal motifs of size 2 (e.g., Neu5Ac(a2-3)Gal(b1-4)) or size 3 (e.g., Neu5Ac(a2-3)Gal(b1-4)GlcNAc)
network
biosynthesis
- Added the possibility of providing abundances to `construct_network` that are then stored as node attributes in the network
- Added `add_high_man_removal` as a post-processing step in `construct_network` to allow for the addition of reactions removing mannoses from high-Man N-glycans occurring during maturation
- Added `estimate_weights` and `get_edge_weight_by_abundance` to estimate reaction capacities from abundances + estimate missing abundances
- Added `get_maximum_flow`, `get_max_flow_path`, and `get_reaction_flow` to calculate maximum flow paths between network root and endpoints as well as aggregate the flow by reaction type
- Added `get_differential_biosynthesis` as a wrapper function to compare two groups of glycomes/networks with regard to their biosynthesis (differential flow paths or differential reaction flows)
- Fixed an issue in `construct_network` in which sometimes nodes with outgoing but no incoming connections were not detected as unconnected nodes, leading to incomplete networks
- Added the `rescue_glycans` decorator to `construct_network`, to allow for auto-fixing nomenclature variations
- Improved performance of `construct_network` by reducing wasteful computation
evolution
- Switched `get_communities` from using `python-louvain` to the Louvain implementation in `networkx`

1.1.0

Change Log

glycan_data
- Updated sugarbase database and all models
stats
- Newly added module to glycowork
- Moved all the statistics functions from `motif.processing` into this module: `cohen_d`, `mahalanobis_distance`, `mahalanobis_variance`, `variance_stabilization`, `MissForest`, `impute_and_normalize`, and `variance_based_filtering`
- Added `fast_two_sum`, `two_sum`, `expansion_sum`, `hlm`, `update_cf_for_m_n`, `jtkdist`, `jtkinit`, `jtkstat`, and `jtkx` helper functions for JTK test
- Added `get_BF` to calculate Jeffreys' approximate Bayes factor based on sample size and p-value
- Added `get_alphaN` to calculate sample size-appropriate significance cut-offs informed by Bayesian statistics
- Added `pi0_tst` and `TST_grouped_benjamini_hochberg` to perform a Two-Stage adaptive Benjamini-Hochberg procedure based on groups (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175141/ or https://www.biorxiv.org/content/10.1101/2024.01.13.575531v1)
- Added `test_inter_vs_intra_group` to estimate intra- versus inter-group correlation with a mixed-effects model for groupings of glycans based on domain expertise
motif
regex
- Newly added module to glycowork
- Added the `get_match` function and associated functions to implement a regular expression system for glycans. This allows for powerful queries to detect and extract motifs of arbitrary complexity.
processing
- Moved `cohen_d`, `mahalanobis_distance`, `mahalanobis_variance`, `variance_stabilization`, `MissForest`, `impute_and_normalize`, and `variance_based_filtering` into `glycan_data.stats` to re-focus `processing` on processing glycan sequences
- Extended `canonicalize_composition` to cases like ‘5_4_2_1’, ‘5421’, and ‘(Hex)2 (HexNAc)2 (Deoxyhexose)1 (NeuAc)2 + (Man)3(GlcNAc)2’
- GlycoCT and WURCS handling for universal input now encompass more monosaccharides and more modifications
- Expanded `oxford_to_iupac` to handle more complex sequences, including sulfation, LacdiNAc, hybrid structures, extended Neu5Ac, complex fucosylation, more custom linkage specifications
- `enforce_class` can now deal with free glycans regardless of whether they end in ‘-ol’ or not
annotate
- `annotate_dataset` and downstream functions now accept a new keyword in “feature_set”, called “custom”. If “custom” is added to “feature_set”, a list of custom motifs can and must be added via the “custom_motifs” keyword argument. “custom” can be mixed and matched with all other keywords in “feature_set”
- `annotate_dataset` now also accepts glyco-regular expressions via the “custom” keyword in “feature_set”. These expressions need to be added within the “custom_motifs” keyword argument and have to start with an “r”, such as "rHex-HexNAc-([Hex|Fuc]){1,2}-HexNAc". Normal motifs and glyco-regular expressions can be freely mixed within “custom_motifs”
- Added `group_glycans_core`, `group_glycans_sia_fuc`, and `group_glycans_N_glycan_type` to group glycans by core structure (for O-glycans), Sia/Fuc/FucSia/Rest, or complex/hybrid/high-man/rest (for N-glycans)
- Fixed a bug in `get_k_saccharides`, in which redundant columns were not always correctly removed
analysis
- Added `get_jtk` to analyze circadian expression of glycans in temporal glycomics datasets using the Jonckheere–Terpstra–Kendall (JTK) algorithm, with the typical interface for motifs and imputation etc analogous to differential expression.
- `get_differential_expression`, `get_glycanova`, and `get_jtk` now use `get_alphaN` to calculate a sample size-appropriate significance cut-off (see https://journals.sagepub.com/doi/10.1177/14761270231214429) and add a ‘significant’ column to the output to display whether the corrected p-values lie below this threshold
- Added the “zscores” keyword argument to `get_pvals_motifs` to perform z-score transformation if used data are not yet z-score transformed, by setting “zscores” to False
- For statistical calculations, `get_pval_motifs` will now weigh the motif occurrences by z-score magnitude, rather than only using a cut-off for enrichment calculations
- Added effect size calculations to `get_pval_motifs` which are also in the output, as Cohen’s d
- Changed `get_pval_motifs` such that now both enrichments and depletions will be tested (with depletions resulting in negative effect sizes)
- Added `select_grouping` to find out which grouping of glycans has the highest intra- versus inter-group correlation, as estimated by `glycan_data.stats.test_inter_vs_intra_group`
- When “motifs = False” and “grouped_BH = True”, `get_differential_expression` now tries to use the Two-Stage adaptive Benjamini-Hochberg procedure based on groups for multiple testing correction, if meaningful groups can be found in the glycans [note this makes everything at least one order of magnitude slower, though most datasets should still finish in a few seconds]
draw
- In `GlycoDraw`, the “highlight_motif” keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single ‘r’ before your glyco-regular expression to indicate that it is indeed a regular expression)
- Added `plot_glycans_excel` to allow for the automated insertion of `GlycoDraw` SNFG pictures into an Excel file containing glycan sequences
graph
- `categorical_node_match_wildcard` now uses string ID for matching, instead of integer ID, which means even two graphs, generated with two different libs, can now be successfully compared via `compare_glycans` or `subgraph_isomorphism`
- `compare_glycans` or `subgraph_isomorphism` (and all functions using these functions) now support negation, by prepending “!”. For instance, “!Fuc(a1-?)Gal(b1-4)GlcNAc” will match subsequences that have a monosaccharide that is NOT Fuc before the Gal. It is *highly* recommend to generate your own lib via `get_lib` if you use negation, as monosaccharides such as !Fuc are *not* within lib and will cause indexing errors.
- Added “?1-?” as another ultimate wildcard (promoting it from a strong narrow wildcard)
- Fixed some cases where “Monosaccharide” was not treated as an ultimate wildcard in graph operations
- Fixed an issue in `graph_to_string` in which glycans of size 1 (e.g., “GalNAc”) sometimes were missing their first character
network
- Updated pre-calculated biosynthetic networks for milk oligosaccharides
biosynthesis
- Refactored `find_diff` to make networks compatible with the automated, dynamic wildcards (i.e., ? behave as they should and don’t necessarily cause over-branching of the network)
- In `highlight_network`, the “motif” keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single ‘r’ before your glyco-regular expression to indicate that it is indeed a regular expression)
ml
model_training
- In `training_setup`, upgraded the loss functions for all classification problems to PolyLoss with label smoothing (see https://arxiv.org/abs/2204.12511 for details).
- In `training_setup`, number of classes (for multiclass or multilabel classification) can now be specified via the new “num_classes” keyword argument

1.0.1

Change Log
motif
processing
- Slightly extended WURCS parsing in `wurcs_to_iupac`
- Fixed an issue in `choose_correct_isoform` in which errors would be caused if the input list contained only duplicate glycans
- Fixed an issue in `choose_correct_isoform` in which errors would be caused if the input list contained only glycans without branching
draw
- Adapted cairosvg imports so that, even without cairosvg dependencies, users can plot glycans inline and export as .svg files (only export as .pdf and export of `annotate_figure` is still restricted to cairosvg)
network
biosynthesis
- Fixed handling of empty outputs of `choose_correct_isoform` in `construct_network`
evolution
- Fixed dictionary handling in `get_communities`

1.0.0

Change Log
- Added a Zenodo badge, to have a release-specific doi for glycowork
glycan_data
- Updated sugarbase database; sugarbase is now pickled, so literal evaluations are necessary
- Harmonized glycan column names across generated dataframes; all use ‘glycan’ now, ‘target’ has been deprecated
loader
- Updated `motif_list` to be compatible with new position encoding
- Added Internal_LewisX and Internal_LewisA to `motif_list` (renamed LewisX and LewisA to Terminal_LewisX and Terminal_LewisA, correspondingly)
- Made `df_species` static again to speed up package import
- Added `find_nth_reverse` helper function that finds the starting index of the nth occurrence of a substring from the end of the string
- Added `remove_unmatched_brackets` helper function to strip unmatched opening or closing brackets from glycan strings
motif
- Added more masses to mz_to_composition.csv / `mass_dict`: Acetonitrile, Formate, Cl-, HCO3-, and NH4+
processing
- Extended `canonicalize_iupac` to cases like "NeuGcα3Galβ3(NeuAcα6)GalNAcol" and even more modification formulations, e.g., “6S-GlcNAc”
- Added `canonicalize_composition` to convert compositions formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1 into dictionaries used by glycowork
- Added GalNAc4S to permitted reducing end monosaccharides for O-linked glycans in `enforce_class`
- `MissForest` now has a maximum number of iterations and will check for convergence each iteration (immediately finishing upon converging), yielding some speed-ups in most cases
- The output of `min_process_glycans` no longer contains empty strings for glycans ending in a linkage
- Updated `choose_correct_isoform` to be compatible with change in `min_process_glycans`
- Added `get_possible_linkages` to retrieve linkages matching a wildcarded linkage
- Added `get_possible_monosaccharides` to retrieve monosaccharides matching a monosaccharide type (HexNAc, etc.)
- Added decorators, `rescue_glycans` and `rescue_compositions`, to canonicalize them in case a decorated function errors out
- Added `linearcode_to_iupac` to support LinearCode as input format for glycowork (this will be called within `canonicalize_iupac` and the decorators); note that for now coverage may not be perfect yet
- Added `iupac_extended_to_condensed` to support IUPAC-extended as input format for glycowork (this will be called within `canonicalize_iupac` and the decorators); note that for now coverage may not be perfect yet
- Added `glycoct_to_iupac` to support GlycoCT as input format for glycowork (this will be called within `canonicalize_iupac` and the decorators); note that for now coverage may not be perfect yet
- Added `wurcs_to_iupac` to support WURCS as input format for glycowork (this will be called within `canonicalize_iupac` and the decorators); note that for now coverage may not be perfect yet
- Added `oxford_to_iupac` to support Oxford as input format for glycowork (this will be called within `canonicalize_iupac` and the decorators); note that for now coverage is limited
- `check_nomenclature` (formerly in `motif.tokenization`) now handles outputting warning messages for trying to use non-string, non-graph nomenclatures or SMILES with glycowork functions
- Expanded `find_isomorphs` to generate more isomorphic sequence variants and thereby increasing the chances that `choose_correct_isoform` will have access to the canonical sequence
- Fixed a rare issue with `canonicalize_iupac` where sequences coming from `structure_to_basic` would sometimes be formatted incorrectly if they contained dHex
- Fixed an issue in `find_isomorphs` in which double branches were not always correctly swapped
analysis
- `get_heatmap` now no longer tries to convert data to relative abundances if negative values are detected in the input
- All functions using dataframes as inputs in `analysis` can now also be used by providing full filepaths to the .csv file instead
- Optimized some of the code for readability and speed (everything should be at least a bit faster now)
annotate
- `get_k_saccharides` is now allowed to generate new dynamic motifs with tokens outside of lib (via `expand_lib`)
- `annotate_glycan` and `annotate_dataset` now also support narrow wildcards
- Fixed an issue in `count_unique_subgraphs_of_size_k` in which branched motifs were not always correctly formatted (i.e., opening/closing brackets)
- `get_k_saccharides` now outputs dataframes with counts as default and can yield the old nested lists of motifs by setting the new keyword `just_motifs` to True
- Fixed an edge case in which `get_k_saccharides` sometimes overcounted individual monosaccharides if their strings overlapped
graph
- `subgraph_isomorphism` and `compare_glycans` now support using wildcards and position encoding at the same time. The `extra` keyword argument is now deprecated and the functions auto-detect whether anything has been specified in wildcards and/or termini_list
- `subgraph_isomorphism` and `compare_glycans` now support automatically inferred narrow wildcards to allow for (i) matching linkages like a1-? to only specified linkages within that group (e.g., a1-3 but not b1-3 etc.) and (ii) matching monosaccharide types like HexNAc to only specified monosaccharides of that type (e.g., GlcNAc but not Glc, etc.)
- The `wildcard_list` keyword argument in all graph & annotation functions is now deprecated as wildcards are inferred automatically via narrow wildcards and native full wildcards (?1-? and Monosaccharide)
- `subgraph_isomorphism` now behaves as expected for testing motifs ending in linkages on glycans ending in linkages
- `subgraph_isomorphism` can now return the matched subgraphs in the input glycan with the new `return_matches` keyword argument
- `glycan_to_nxGraph` is now decorated with the `rescue_glycans` decorator, which auto-canonicalizes IUPAC strings if they are not in the format preferred by glycowork
- Fixed mismatch of labels and string_labels in `categorical_node_match_wildcard`
- Fixed an issue in `subgraph_isomorphism` in which, when using positional encoding, sometimes the mirror image of a motif was incorrectly captured if the termini aligned
- `termini_list` within `subgraph_isomorphism` now only requires the specification of monosaccharide positions
- Added `expand_termini_list` helper function to facilitate the expansion of monosaccharide-only `termini_list` into full `termini_list` behind the scenes
- Added support for shorthand notation of position encoding, now either ‘terminal’ or ‘t’ will work
- Improved handling of complex branching in `graph_to_string`; should be fewer unexpected translations now
- Fixed an issue in `graph_to_string` in which induced subgraphs could cause errors due to unexpected or weirdly sorted node indices
- Fixed an edge case in which the reducing end could be sometimes calculated as ‘internal’ when termini=’calc’ in `glycan_to_nxGraph`
- Deprecated a duplicate `character_to_label` and `string_to_labels`
- Deprecated `categorical_termini_match`; the functionality is now handled within `categorical_node_match_wildcard`
- Deprecated the `wildcards` keyword argument from `compare_glycans` as this will now be detected internally, if wildcards are provided via `wildcard_list`
tokenization
- Composition functions (e.g., `composition_to_mass`) are now decorated with `rescue_compositions`, which means that they can be used with compositions like “H3N2” (basically anything that `canonicalize_composition` can handle)
- Deprecated `character_to_label` as it’s now handled within `string_to_labels`
- Moved `check_nomenclature` into motif.processing
- Optimized some of the code for readability and speed (most things should be at least a bit faster now)
draw
- Support motif highlighting in `GlycoDraw`: by providing the `highlight_motif` keyword argument, motifs can be highlighted (everything else will be set to low opacity). Works with IUPAC-condensed motifs and named motifs from `known`
- Support wildcards in motif highlighting with the `highlight_wildcard_list` keyword argument, for instance highlighting all `Gal(?1-?)GlcNAc` subunits (for Gal(b1-?)GlcNAc you don’t need `highlight_wildcard_list`, as narrow wildcards are handled automatically)
- Support positional encoding in motif highlighting with the `highlight_termini_list` keyword argument, for instance highlighting all terminal, non-reducing end `Gal(b1-?)GlcNAc` subunits (yes, you can use both wildcards and positional encoding at the same time😊)
- Support drawing of repeat structures (indicated by brackets and the number of repeats) via the new `repeat` keyword argument. Internal repeats can also be specified with the additional `repeat_range` keyword argument.
- Optimized some of the code for readability and speed (most things should be at least a bit faster now)

network
biosynthesis
- Optimized some of the code for readability and speed (everything should be up to 2x faster now)
evolution
- Optimized some of the code for readability and speed (everything should be at least a bit faster now)

ml
- Optimized some of the code for readability and speed (most things should be at least a bit faster now)



v0.8.1-zenodo
Literally no code changes at this point (0.9 is expected to come in December) but Zenodo requires a new release to mint a doi

0.8.1

motif
tokenization
- Converted `chars` into a dict to match `libr` formatting
- Updated `constrain_prot` to work with the change above

ml
models
- Changed `prep_model` to load trained models onto the CPU if no GPU is available

0.8.0

- Linted the package with flake8
- Increased code coverage
- Added another optional extras install, [chem], including glyles, requests, and pubchempy

glycan_data
- Changed `lib` to be a dict of type glycoletters:index, as it’s faster to index a dict vs. a long list; also adapted all functions using `lib` to reflect this change

**loader**
- Added `replace_every_second` helper function
- Updated `linkages` list
- Changed `linkages` and `Hex` etc to be sets instead of lists
motif
**processing**
- Added `variance_stabilization` for variance stabilization normalization, both globally and group-specific
- Added `in_lib` helper function to check whether all glycoletters of glycan are in lib
- Deprecated `small_motif_find`
- `cohen_d` now also returns the variance of the effect size and supports paired samples as well (calculating Cohen’s dz in this case)
- Added `mahalanobis_distance` to calculate Mahalanobis distance as an effect size for multivariate comparisons
- Added `mahalanobis_variance` to estimate variance of Mahalanobis distance via bootstrapping
- Added `MissForest` for random forest based data imputation
- Cleaned up `canonicalize_iupac` and made it slightly faster
- Added `variance_based_filtering`
- Added `impute_and_normalize` and underlying helper functions
- Fixed numpy random seed for reproducibility
- Sped-up `presence_to_matrix`

**tokenization**
- Deprecated `mz_to_composition`
- `mz_to_composition2` is now the new `mz_to_composition`
- Adapted `mz_to_structures`, `compositions_to_structures`, and `match_composition_relaxed` to work with this change

**annotate**
- Added `create_correlation_network` to identify clusters of highly correlated glycans/motifs
- Added `count_unique_subgraphs_of_size_k` as a helper function within `get_k_saccharides`
- Refactor `get_k_saccharides` to be faster and more complete (and be, effectively, a replacement of `motif_matrix`)
- `annotate_dataset` now uses `get_k_saccharides` for mono- and disaccharides, instead of `motif_matrix`
- Deprecated `motif_matrix`
- `annotate_dataset` now also creates relevant ?-containing motifs if ‘terminal’ in feature_set, even if they don’t explicitly occur in the glycan strings
- Big speed-up for `annotate_dataset` if known=True, as we now cache the precalculated motif graphs
- Added `quantify_motifs` as a wrapper around `annotate_dataset` to adequately distribute relative abundances across extracted motifs
- Deprecated `estimate_lower_bound` as speed-ups make it no longer necessary

**analysis**
- Renamed `make_heatmap` to `get_heatmap`
- Renamed `make_volcano` to `get_volcano`
- Deprecated `replace_zero_with_random_gaussian` (this is now handled by `MissForest` in .processing within `impute_and_normalize`)
- Added `hotellings_t2` for multivariate comparisons
- Changed multiple-testing correction method from Holm-Sidak to Benjamini-Hochberg
- Added `variance_stabilization` in `get_differential_expression`
- Added the option to analyze highly correlated sets of glycans/motifs (via `create_correlation_network`) within `get_differential_expression`
- Implemented usage of `hotellings_t2` and the Mahalanobis distance (as effect size) for usage if sets are analyzed within `get_differential_expression`
- `get_heatmap` and `get_differential_expression` now scale abundances by the actual counts of motifs per glycan, not just absence/presence
- Added `get_meta_analysis` to estimate combined effect sizes from the results of multiple studies (both fixed-effects and random-effects models can be estimated)
- Added `variance_based_filtering` in `get_differential_expression`
- Effect size variances can now also be retrieved within `get_differential_expression` via the effect_size_variance keyword argument
- `get_differential_expression` now also can handle paired samples when paired=True
- `get_differential_expression` now also tests the homogeneity of variances using Levene’s test in all settings (also multiple-testing controlled)
- Added `get_glycanova` to use ANOVA-based analyses on glycomics datasets (uses basically all the improvements of `get_differential_expression`, including analysis on the motif level)
- Added `get_pca` to plot glycomics data (also has the motif interface)
- Added `get_pval_distribution` to plot the distribution of p-values
- Added `get_ma` to plot a Bland-Altman plot
- Added `get_glycan_change_over_time` to detect significant changes in time-course data via OLS fitting
- Added `get_time_series` as a wrapper around `get_glycan_change_over_time` to do time series analyses, with all the motif & normalization functionality
- Added `get_coverage` to visualize glycan expression across samples (ordered by average intensity) in a coverage plot

**draw**
- Added import warning if draw dependencies are not installed
- Removed `pycairo` from dependencies
- Modified `annotate_figure` to be compatible with .svg files from older Matplotlib versions
- Changed “output” to “filepath” in `GlycoDraw`
- If there are “?” in the provided filepath for `GlycoDraw`, they will now be automatically replaced with “_” to avoid saving errors

**graph**
- Sped-up `glycan_to_graph`/`glycan_to_nxGraph` (and all downstream functions, which are *a lot*)
- Also improved the runtime of downstream functions, such as `subgraph_isomorphism` independent of these advances
- `subgraph_isomorphism` now also accepts precalculated motif graph as inputs (in addition to the already supported precalculated glycan graphs)
ml
- Rephrased import warnings to reflect optional install strategy for extra dependencies

**model_training**
- Sped-up `train_ml_model`
network
**biosynthesis**
- `create_neighbors` no longer uses the libr keyword

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.