**ml**
_models_
- added LectinOracle as option for prep_model & modified prep_model to allow for loading trained models
_model_training_
- train_ml_model now allows for additional (optional) input features
- changed default optimizer from Adam to AdamW
- changed default learning rate scheduler from cosine-decay to ReduceLROnPlateau
_processing_
- split_data_to_train now allows for additional (optional) input features
- label_type is now also an optional argument for split_data_to_train and all lower-level functions
_model_training_
- modified train_model to allow for LectinOracle training
_representation/inference_
- renamed “representation” module into “inference”
- added get_lectin_preds to use LectinOracle for inferring binding specificity of lectins
- added get_esm1b_representation to retrieve ESM1b representations for new lectins, to use them as input for LectinOracle
**motif**
_query_
- added tissue expression and disease association to get_insight
- glytoucan_to_glycan now more robust in dealing with missing GlyTouCan IDs
_tokenization_
- added condense_composition_matching to find the minimum number of glycans to characterize matching compositions
- added compositions_to_structures wrapper function that will take a list of compositions, find possible matches, condense them into the minimum number of structures, and match them with values, such as provided relative intensities
- added structures_to_motifs function to convert datasets of relative intensities of glycan structures to relative intensities of the corresponding glycan motifs
- changed default mode of match_composition_relaxed to “exact”
- modified match_composition_relaxed to allow for filtering possible matches based on reducing end monosaccharide (e.g., O-linked glycans)
- fixed issue in match_composition_relaxed that prevented the addition of additional monosaccharide types to the composition
- moved motif_matrix and dependencies over to motif.annotate
**glycan_data**
- replaced glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans) with glyco_targets_species_seq_all_V5 (~31,500 species-specific glycans) and v5_sugarbase (~50,500 unique glycans)
- added directed disease associations (currently 533 associations) and tissue expression (currently 2,815 associations) for glycans in v5_sugarbase
- changed nomenclature of glycolipids (mostly receive an “1Cer” at their reducing end, for instance “Glc1Cer”) and free oligosaccharides (receive an “-ol” at their reducing end, for instance “Glc-ol”)
- made Lewis motifs in motif_list more general
- correspondingly updated glycan ML models, representations, and substitution matrix