GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

Yongjun Xiao; Dian Meng; Xinlei Huang; Yanran Liu; Shiwei Ruan; Ziyue Qiao; Xubin Zheng

GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

Yongjun Xiao, Dian Meng, Xinlei Huang, Yanran Liu, Shiwei Ruan, Ziyue Qiao, Xubin Zheng

TL;DR

GROVER tackles the challenge of fusing spatial omics (RNA, ADT) with histology by introducing a graph-guided, adaptive framework that preserves modality-specific signals while aligning them across spatial context. It combines a dual-graph encoder (spatial and modality-specific) built on a multilayer KAN-GCN, a spot-feature-pair contrastive loss for cross-modal alignment, and a self-adaptive Mixture of Experts that gates modality contributions per spot, followed by a graph-based decoder for reconstruction. The approach achieves state-of-the-art performance across four public spatial multi-omics datasets and nine clustering metrics, with ablations confirming the importance of each component (KAN-GCN, contrastive learning, and MoE) for robustness in noisy and heterogeneous data. GROVER's adaptive fusion and robust cross-modal alignment enhance tissue organization understanding, enabling more reliable downstream analyses in spatial biology and pathology.

Abstract

Effectively modeling multimodal spatial omics data is critical for understanding tissue complexity and underlying biological mechanisms. While spatial transcriptomics, proteomics, and epigenomics capture molecular features, they lack pathological morphological context. Integrating these omics with histopathological images is therefore essential for comprehensive disease tissue analysis. However, substantial heterogeneity across omics, imaging, and spatial modalities poses significant challenges. Naive fusion of semantically distinct sources often leads to ambiguous representations. Additionally, the resolution mismatch between high-resolution histology images and lower-resolution sequencing spots complicates spatial alignment. Biological perturbations during sample preparation further distort modality-specific signals, hindering accurate integration. To address these challenges, we propose Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion (GROVER), a novel framework for adaptive integration of spatial multi-omics data. GROVER leverages a Graph Convolutional Network encoder based on Kolmogorov-Arnold Networks to capture the nonlinear dependencies between each modality and its associated spatial structure, thereby producing expressive, modality-specific embeddings. To align these representations, we introduce a spot-feature-pair contrastive learning strategy that explicitly optimizes the correspondence across modalities at each spot. Furthermore, we design a dynamic expert routing mechanism that adaptively selects informative modalities for each spot while suppressing noisy or low-quality inputs. Experiments on real-world spatial omics datasets demonstrate that GROVER outperforms state-of-the-art baselines, providing a robust and reliable solution for multimodal integration.

GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

TL;DR

Abstract

GROVER: Graph-guided Representation of Omics and Vision with Expert Regulation for Adaptive Spatial Multi-omics Fusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)