Table of Contents
Fetching ...

Higher-Order Message Passing for Glycan Representation Learning

Roman Joeres, Daniel Bojar

TL;DR

Glycans present intricate, branched structures that challenge predictive modeling. The authors propose GIFFLAR, a GNN that leverages combinatorial complexes to represent atoms, bonds, and monosaccharides and applies higher-order message passing to capture multi-scale topology. On an expanded GlycanML benchmark, GIFFLAR achieves state-of-the-art results across diverse tasks (taxonomy, glycosylation, immunogenicity) and outperforms both traditional and other GNN baselines, with robust ablation analyses guiding architectural choices. This work advances computational glycobiology by delivering a scalable, end-to-end learnable glycan encoder and points toward extensions to other complex biomolecules and pre-training strategies.

Abstract

Glycans are the most complex biological sequence, with monosaccharides forming extended, non-linear sequences. As post-translational modifications, they modulate protein structure, function, and interactions. Due to their diversity and complexity, predictive models of glycan properties and functions are still insufficient. Graph Neural Networks (GNNs) are deep learning models designed to process and analyze graph-structured data. These architectures leverage the connectivity and relational information in graphs to learn effective representations of nodes, edges, and entire graphs. Iteratively aggregating information from neighboring nodes, GNNs capture complex patterns within graph data, making them particularly well-suited for tasks such as link prediction or graph classification across domains. This work presents a new model architecture based on combinatorial complexes and higher-order message passing to extract features from glycan structures into a latent space representation. The architecture is evaluated on an improved GlycanML benchmark suite, establishing a new state-of-the-art performance. We envision that these improvements will spur further advances in computational glycosciences and reveal the roles of glycans in biology.

Higher-Order Message Passing for Glycan Representation Learning

TL;DR

Glycans present intricate, branched structures that challenge predictive modeling. The authors propose GIFFLAR, a GNN that leverages combinatorial complexes to represent atoms, bonds, and monosaccharides and applies higher-order message passing to capture multi-scale topology. On an expanded GlycanML benchmark, GIFFLAR achieves state-of-the-art results across diverse tasks (taxonomy, glycosylation, immunogenicity) and outperforms both traditional and other GNN baselines, with robust ablation analyses guiding architectural choices. This work advances computational glycobiology by delivering a scalable, end-to-end learnable glycan encoder and points toward extensions to other complex biomolecules and pre-training strategies.

Abstract

Glycans are the most complex biological sequence, with monosaccharides forming extended, non-linear sequences. As post-translational modifications, they modulate protein structure, function, and interactions. Due to their diversity and complexity, predictive models of glycan properties and functions are still insufficient. Graph Neural Networks (GNNs) are deep learning models designed to process and analyze graph-structured data. These architectures leverage the connectivity and relational information in graphs to learn effective representations of nodes, edges, and entire graphs. Iteratively aggregating information from neighboring nodes, GNNs capture complex patterns within graph data, making them particularly well-suited for tasks such as link prediction or graph classification across domains. This work presents a new model architecture based on combinatorial complexes and higher-order message passing to extract features from glycan structures into a latent space representation. The architecture is evaluated on an improved GlycanML benchmark suite, establishing a new state-of-the-art performance. We envision that these improvements will spur further advances in computational glycosciences and reveal the roles of glycans in biology.
Paper Structure (19 sections, 7 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 7 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Lactose in four abstractions. Panel a shows the chemical formula, and Panel b shows the graph-of-atoms (GOA). Panel c shows the combinatorial complex and cells of all ranks. 0-cells in black represent the atoms; 1-cells in orange are bonds between atoms, and 2-cells in yellow and blue denote whole monosaccharides as defined by the GlycoDraw depiction in panel d lundstrom2023glycodraw. Panel e shows an inspirational and motivational Gifflar, a Swedish cinnamon roll.
  • Figure 2: a Averaged normalized performances (ANPs) comparing GIFFLAR (blue) to the eight baselines. b Performance on different depths of GIFFLAR. c Comparison of different combinations of positional encodings and GIFFLAR with 128 feature dimensions (blue) to 1024 (red). d Comparison of different pooling mechanisms on GIFFLAR.
  • Figure 3: t-SNE plot of monosaccharide embeddings of GIFFLAR.
  • Figure 4: Mean Absolute Error of different glycan and protein encoders. In brackets, we denote the relative improvement compared to the LectinOracle baseline ($\text{MAE} = 0.4285$).
  • Figure A1: a-d Reevaluation of \ref{['fig:ablation']} without the normalization in \ref{['algo:anp']}. b contains additional results for testing PEs for the RGCN model. e Comparison of three backbones for the GLAMOUR model.

Theorems & Definitions (2)

  • Definition 3.1
  • Definition 3.2