Table of Contents
Fetching ...

Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling

Mengran Li, Zelin Zang, Wenbin Xing, Junzhou Chen, Ronghui Zhang, Jiebo Luo, Stan Z. Li

TL;DR

The paper tackles robust molecular representation learning in the presence of missing external cellular modalities by introducing CHMR, which jointly models molecular structure, cellular responses, and gene expression across hierarchical biological scales. It combines structure-aware modality augmentation, semantic consistency alignment, a tree-structured vector quantization module, and context-guided reconstruction to preserve cross-scale semantics and improve generalization. Across nine public benchmarks and 728 tasks, CHMR yields consistent improvements in classification and regression, and demonstrates robustness to missing modalities as well as interpretability through hierarchical codes. This approach offers a strong, biologically grounded framework for integrative molecular modeling with practical implications for drug discovery and safety assessment. The methodology is anchored by an objective that combines $\mathcal{L}_{\mathrm{CPR}}$, $\mathcal{L}_{\mathrm{SCA}}$, and $\mathcal{L}_{\mathrm{TreeVQ}}$ with weights $\lambda_1$ and $\lambda_2$, enabling end-to-end learning that captures cross-modal alignment and hierarchical structure.

Abstract

Understanding how chemical perturbations propagate through biological systems is essential for robust molecular property prediction. While most existing methods focus on chemical structures alone, recent advances highlight the crucial role of cellular responses such as morphology and gene expression in shaping drug effects. However, current cell-aware approaches face two key limitations: (1) modality incompleteness in external biological data, and (2) insufficient modeling of hierarchical dependencies across molecular, cellular, and genomic levels. We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that jointly models local-global dependencies between molecules and cellular responses and captures latent biological hierarchies via a novel tree-structured vector quantization module. Evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines, yielding average improvements of 3.6% on classification and 17.2% on regression tasks. These results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations, offering a generalizable framework for integrative biomedical modeling. The code is in https://github.com/limengran98/CHMR.

Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling

TL;DR

The paper tackles robust molecular representation learning in the presence of missing external cellular modalities by introducing CHMR, which jointly models molecular structure, cellular responses, and gene expression across hierarchical biological scales. It combines structure-aware modality augmentation, semantic consistency alignment, a tree-structured vector quantization module, and context-guided reconstruction to preserve cross-scale semantics and improve generalization. Across nine public benchmarks and 728 tasks, CHMR yields consistent improvements in classification and regression, and demonstrates robustness to missing modalities as well as interpretability through hierarchical codes. This approach offers a strong, biologically grounded framework for integrative molecular modeling with practical implications for drug discovery and safety assessment. The methodology is anchored by an objective that combines , , and with weights and , enabling end-to-end learning that captures cross-modal alignment and hierarchical structure.

Abstract

Understanding how chemical perturbations propagate through biological systems is essential for robust molecular property prediction. While most existing methods focus on chemical structures alone, recent advances highlight the crucial role of cellular responses such as morphology and gene expression in shaping drug effects. However, current cell-aware approaches face two key limitations: (1) modality incompleteness in external biological data, and (2) insufficient modeling of hierarchical dependencies across molecular, cellular, and genomic levels. We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that jointly models local-global dependencies between molecules and cellular responses and captures latent biological hierarchies via a novel tree-structured vector quantization module. Evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines, yielding average improvements of 3.6% on classification and 17.2% on regression tasks. These results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations, offering a generalizable framework for integrative biomedical modeling. The code is in https://github.com/limengran98/CHMR.

Paper Structure

This paper contains 54 sections, 32 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The motivation of this paper. Molecular perturbations trigger cellular or genetic changes, but modality incompleteness is common. Through augmentation, alignment, and hierarchical modeling, multi-modal representations are progressively organized and structured.
  • Figure 2: Overview of the CHMR framework for robust molecular property prediction under missing biological modalities. CHMR performs modality augmentation via structure-aware propagation, followed by (1) semantic consistency alignment (SCA) to align molecular and cellular modalities, (2) tree-structured vector quantization (Tree-VQ) to capture hierarchical biological semantics, and (3) context propagation reconstruction (CPR) to enhance generalization through cross-modal context.
  • Figure 3:
  • Figure 4: Visualization of cross-modal alignment and hierarchy. (a–d) show embeddings colored by modality; (e–f) display Tree codes with color-coded assignments and red dots indicating active hierarchical centers.
  • Figure 5: A case study illustrating CHMR's multi-level representation learning. Molecular 1D/2D/3D structures and biological modalities jointly provide complementary cues for pharmacological prediction. CHMR outperforms InfoAlign in prediction accuracy on the majority of tasks.
  • ...and 1 more figures