Learning on Multimodal Graphs: A Survey
Ciyuan Peng, Jiayuan He, Feng Xia
TL;DR
This survey addresses learning on multimodal graphs (MGL) by clarifying MG definitions, taxonomy, and representative learning paradigms. It categorizes existing work into MGCN, MGAT, and MGCL, with discussions of their data fusion mechanisms, strengths, and limitations, as well as other methods. The paper surveys major MG libraries and applications across multimodal knowledge graphs, biomedical graphs, and brain graphs, including datasets and resources. It then discusses key challenges—data imbalance, trustworthy alignment, temporal dynamics, and scalability—and offers guidance on future directions and practical considerations.
Abstract
Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning techniques, and application scenarios. This survey paper conducts a comparative analysis of existing works in multimodal graph learning, elucidating how multimodal learning is achieved across different graph types and exploring the characteristics of prevalent learning techniques. Additionally, we delineate significant applications of multimodal graph learning and offer insights into future directions in this domain. Consequently, this paper serves as a foundational resource for researchers seeking to comprehend existing MGL techniques and their applicability across diverse scenarios.
