Multimodal Fusion with Relational Learning for Molecular Property Prediction
Zhengyang Zhou, Yunrui Li, Pengyu Hong, Hao Xu
TL;DR
This paper tackles the limitations of graph-based molecular representations by introducing MMFRL, which combines a modified relational learning objective with multimodal pretraining and flexible fusion strategies. The approach leverages multiple modalities (e.g., SMILES, images, fingerprints, NMR data) to initialize and fine-tune molecular encoders, and systematically analyzes early, intermediate, and late fusion to understand their impact on predictive performance. MMFRL achieves state-of-the-art results on MoleculeNet benchmarks, provides explainability through case studies like ESOL and BACE, and demonstrates that continuous relational metrics can better capture inter-molecule relationships than binary contrastive signals. The work advances practical drug discovery workflows by enabling task-specific multimodal pretraining and by offering interpretable representations that reveal structure–activity relationships.
Abstract
Graph based molecular representation learning is essential for accurately predicting molecular properties in drug discovery and materials science; however, it faces significant challenges due to the intricate relationships among molecules and the limited chemical knowledge utilized during training. While contrastive learning is often employed to handle molecular relationships, its reliance on binary metrics is insufficient for capturing the complexity of these interactions. Multimodal fusion has gained attention for property reasoning, but previous work has explored only a limited range of modalities, and the optimal stages for fusing different modalities in molecular property tasks remain underexplored. In this paper, we introduce MMFRL (Multimodal Fusion with Relational Learning for Molecular Property Prediction), a novel framework designed to overcome these limitations. Our method enhances embedding initialization through multimodal pretraining using relational learning. We also conduct a systematic investigation into the impact of modality fusion at different stages such as early, intermediate, and late, highlighting their advantages and shortcomings. Extensive experiments on MoleculeNet benchmarks demonstrate that MMFRL significantly outperforms existing methods. Furthermore, MMFRL enables task-specific optimizations. Additionally, the explainability of MMFRL provides valuable chemical insights, emphasizing its potential to enhance real-world drug discovery applications.
