Glycowork

Latest version: v1.4.0

Safety actively analyzes 687959 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.3.0

**ml**
_models_
- added LectinOracle as option for prep_model & modified prep_model to allow for loading trained models
_model_training_
- train_ml_model now allows for additional (optional) input features
- changed default optimizer from Adam to AdamW
- changed default learning rate scheduler from cosine-decay to ReduceLROnPlateau
_processing_
- split_data_to_train now allows for additional (optional) input features
- label_type is now also an optional argument for split_data_to_train and all lower-level functions
_model_training_
- modified train_model to allow for LectinOracle training
_representation/inference_
- renamed “representation” module into “inference”
- added get_lectin_preds to use LectinOracle for inferring binding specificity of lectins
- added get_esm1b_representation to retrieve ESM1b representations for new lectins, to use them as input for LectinOracle

**motif**
_query_
- added tissue expression and disease association to get_insight
- glytoucan_to_glycan now more robust in dealing with missing GlyTouCan IDs
_tokenization_
- added condense_composition_matching to find the minimum number of glycans to characterize matching compositions
- added compositions_to_structures wrapper function that will take a list of compositions, find possible matches, condense them into the minimum number of structures, and match them with values, such as provided relative intensities
- added structures_to_motifs function to convert datasets of relative intensities of glycan structures to relative intensities of the corresponding glycan motifs
- changed default mode of match_composition_relaxed to “exact”
- modified match_composition_relaxed to allow for filtering possible matches based on reducing end monosaccharide (e.g., O-linked glycans)
- fixed issue in match_composition_relaxed that prevented the addition of additional monosaccharide types to the composition
- moved motif_matrix and dependencies over to motif.annotate

**glycan_data**
- replaced glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans) with glyco_targets_species_seq_all_V5 (~31,500 species-specific glycans) and v5_sugarbase (~50,500 unique glycans)
- added directed disease associations (currently 533 associations) and tissue expression (currently 2,815 associations) for glycans in v5_sugarbase
- changed nomenclature of glycolipids (mostly receive an “1Cer” at their reducing end, for instance “Glc1Cer”) and free oligosaccharides (receive an “-ol” at their reducing end, for instance “Glc-ol”)
- made Lewis motifs in motif_list more general
- correspondingly updated glycan ML models, representations, and substitution matrix

0.2.0

**motif**
_tokenization_
- added functions for stemifying glycans (by removing rare modifications)
- added match_composition & match_composition_relaxed for finding glycan structures in stored or provided databases that match a provided composition. Can be narrowed down to, e.g., a species of interest.

_graph_
- added function to translate glycan graph back to IUPAC-condensed string
- added try_string_conversion function to check whether glycan graph describes valid glycan
- modified generate_graph_features to also work with networks

_analysis_
- update plot_embeddings to use representation dataframes as inputs in addition to dictionaries
- swap subplots in characterize_monosaccharide and modify labelling to enhance clarity
- get_pvals_motifs now allows for a custom motif_list via the optional motifs argument
- plot_embeddings now allows for a custom color palette

_query_
- added glytoucan_to_glycan function to interconvert GlyTouCan IDs and glycans
- get_insight now also yields the GlyTouCan ID of a glycan (if available) + the predicted taxonomy if no taxonomy is recorded in our database

_annotate_
- added get_trisaccharides to retrieve a subset of the trisaccharides occurring in a glycan
- added estimate_lower_bound to give make_heatmap + get_pvals_motifs a speedup option with estimate_speedup = True (warning: estimate_lower_bound is an estimate and might in theory lead to missed motifs in the motif annotation); typically results in a 3x speed-up

**network**
- beta version of completely new module that is still in active development

_biosynthesis_
- added functions to find neighbors in biosynthesis space (one reaction removed)
- added functions to plot biosynthetic network for a set of glycans
- added functions to combine/align biosynthetic networks

**glycan_data**
- replaced glyco_targets_species_seq_all_V3 (~13,000 species-specific glycans) and v3_sugarbase (~20,000 unique glycans) with glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans)
- correspondingly updated glycan ML models, representations, and substitution matrix
- next to all the new glycans, many pre-existing glycans are now better specified (e.g., Gal3S instead of GalOS, wherever location of modification is known)
- GlyTouCan IDs were added whenever possible
- motif_list was expanded by two new motifs (difucosylated N-glycan core & extended core fucose)

**ml**
_train_test_split_
- modified hierarchy_filter to ignore glycans with ‘undetermined’ taxonomy label

0.1.0

Page 3 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.