Table of Contents
Fetching ...

GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

Yongjun Xiao, Dian Meng, Xinlei Huang, Yanran Liu, Shiwei Ruan, Ziyue Qiao, Xubin Zheng

TL;DR

GROVER tackles the challenge of fusing spatial omics (RNA, ADT) with histology by introducing a graph-guided, adaptive framework that preserves modality-specific signals while aligning them across spatial context. It combines a dual-graph encoder (spatial and modality-specific) built on a multilayer KAN-GCN, a spot-feature-pair contrastive loss for cross-modal alignment, and a self-adaptive Mixture of Experts that gates modality contributions per spot, followed by a graph-based decoder for reconstruction. The approach achieves state-of-the-art performance across four public spatial multi-omics datasets and nine clustering metrics, with ablations confirming the importance of each component (KAN-GCN, contrastive learning, and MoE) for robustness in noisy and heterogeneous data. GROVER's adaptive fusion and robust cross-modal alignment enhance tissue organization understanding, enabling more reliable downstream analyses in spatial biology and pathology.

Abstract

Effectively modeling multimodal spatial omics data is critical for understanding tissue complexity and underlying biological mechanisms. While spatial transcriptomics, proteomics, and epigenomics capture molecular features, they lack pathological morphological context. Integrating these omics with histopathological images is therefore essential for comprehensive disease tissue analysis. However, substantial heterogeneity across omics, imaging, and spatial modalities poses significant challenges. Naive fusion of semantically distinct sources often leads to ambiguous representations. Additionally, the resolution mismatch between high-resolution histology images and lower-resolution sequencing spots complicates spatial alignment. Biological perturbations during sample preparation further distort modality-specific signals, hindering accurate integration. To address these challenges, we propose Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion (GROVER), a novel framework for adaptive integration of spatial multi-omics data. GROVER leverages a Graph Convolutional Network encoder based on Kolmogorov-Arnold Networks to capture the nonlinear dependencies between each modality and its associated spatial structure, thereby producing expressive, modality-specific embeddings. To align these representations, we introduce a spot-feature-pair contrastive learning strategy that explicitly optimizes the correspondence across modalities at each spot. Furthermore, we design a dynamic expert routing mechanism that adaptively selects informative modalities for each spot while suppressing noisy or low-quality inputs. Experiments on real-world spatial omics datasets demonstrate that GROVER outperforms state-of-the-art baselines, providing a robust and reliable solution for multimodal integration.

GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

TL;DR

GROVER tackles the challenge of fusing spatial omics (RNA, ADT) with histology by introducing a graph-guided, adaptive framework that preserves modality-specific signals while aligning them across spatial context. It combines a dual-graph encoder (spatial and modality-specific) built on a multilayer KAN-GCN, a spot-feature-pair contrastive loss for cross-modal alignment, and a self-adaptive Mixture of Experts that gates modality contributions per spot, followed by a graph-based decoder for reconstruction. The approach achieves state-of-the-art performance across four public spatial multi-omics datasets and nine clustering metrics, with ablations confirming the importance of each component (KAN-GCN, contrastive learning, and MoE) for robustness in noisy and heterogeneous data. GROVER's adaptive fusion and robust cross-modal alignment enhance tissue organization understanding, enabling more reliable downstream analyses in spatial biology and pathology.

Abstract

Effectively modeling multimodal spatial omics data is critical for understanding tissue complexity and underlying biological mechanisms. While spatial transcriptomics, proteomics, and epigenomics capture molecular features, they lack pathological morphological context. Integrating these omics with histopathological images is therefore essential for comprehensive disease tissue analysis. However, substantial heterogeneity across omics, imaging, and spatial modalities poses significant challenges. Naive fusion of semantically distinct sources often leads to ambiguous representations. Additionally, the resolution mismatch between high-resolution histology images and lower-resolution sequencing spots complicates spatial alignment. Biological perturbations during sample preparation further distort modality-specific signals, hindering accurate integration. To address these challenges, we propose Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion (GROVER), a novel framework for adaptive integration of spatial multi-omics data. GROVER leverages a Graph Convolutional Network encoder based on Kolmogorov-Arnold Networks to capture the nonlinear dependencies between each modality and its associated spatial structure, thereby producing expressive, modality-specific embeddings. To align these representations, we introduce a spot-feature-pair contrastive learning strategy that explicitly optimizes the correspondence across modalities at each spot. Furthermore, we design a dynamic expert routing mechanism that adaptively selects informative modalities for each spot while suppressing noisy or low-quality inputs. Experiments on real-world spatial omics datasets demonstrate that GROVER outperforms state-of-the-art baselines, providing a robust and reliable solution for multimodal integration.

Paper Structure

This paper contains 22 sections, 27 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: The framework of the proposed GROVER.GROVER encodes modality-specific feature graphs and spatial adjacency graphs using KAN-GCN, then applies attention-based weighted fusion to obtain integrated multimodal representations (RNA, protein, and image). A spot-feature-pair based contrastive learning aligns semantic information across modalities before feeding the embeddings into a self-adaptive Mixture-of-Experts model for fusion. The entire model is trained with modality-specific reconstruction losses and the spot-feature-pair contrastive loss.
  • Figure 2: Visualization of clustering results by GROVER and baseline methods on four spatial multi-omics datasets. From top to bottom: (1) Human Tonsil, (2) Human Glioblastoma, (3) Human Breast Cancer, and (4) Human Tonsil with Add-on Antibodie.
  • Figure 3: Parameter sensitivity analysis of GROVER on the Human Glioblastoma dataset.