Table of Contents
Fetching ...

TMTE: Effective Multimodal Graph Learning with Task-aware Modality and Topology Co-evolution

Yinlin Zhu, Xunkai Li, Di Wu, Wang Luo, Miao Hu, Di Wu

Abstract

Multimodal-attributed graphs (MAGs) are a fundamental data structure for multimodal graph learning (MGL), enabling both graph-centric and modality-centric tasks. However, our empirical analysis reveals inherent topology quality limitations in real-world MAGs, including noisy interactions, missing connections, and task-agnostic relational structures. A single graph derived from generic relationships is therefore unlikely to be universally optimal for diverse downstream tasks. To address this challenge, we propose Task-aware Modality and Topology co-Evolution (TMTE), a novel MGL framework that jointly and iteratively optimizes graph topology and multimodal representations toward the target task. TMTE is motivated by the bidirectional coupling between modality and topology: multimodal attributes induce relational structures, while graph topology shapes modality representations. Concretely, TMTE casts topology evolution as multi-perspective metric learning over modality embeddings with an anchor-based approximation, and formulates modality evolution as smoothness-regularized fusion with cross-modal alignment, yielding a closed-loop task-aware co-evolution process. Extensive experiments on 9 MAG datasets and 1 non-graph multimodal dataset across 6 graph-centric and modality-centric tasks show that TMTE consistently achieves state-of-the-art performance. Our code is available at https://anonymous.4open.science/r/TMTE-1873.

TMTE: Effective Multimodal Graph Learning with Task-aware Modality and Topology Co-evolution

Abstract

Multimodal-attributed graphs (MAGs) are a fundamental data structure for multimodal graph learning (MGL), enabling both graph-centric and modality-centric tasks. However, our empirical analysis reveals inherent topology quality limitations in real-world MAGs, including noisy interactions, missing connections, and task-agnostic relational structures. A single graph derived from generic relationships is therefore unlikely to be universally optimal for diverse downstream tasks. To address this challenge, we propose Task-aware Modality and Topology co-Evolution (TMTE), a novel MGL framework that jointly and iteratively optimizes graph topology and multimodal representations toward the target task. TMTE is motivated by the bidirectional coupling between modality and topology: multimodal attributes induce relational structures, while graph topology shapes modality representations. Concretely, TMTE casts topology evolution as multi-perspective metric learning over modality embeddings with an anchor-based approximation, and formulates modality evolution as smoothness-regularized fusion with cross-modal alignment, yielding a closed-loop task-aware co-evolution process. Extensive experiments on 9 MAG datasets and 1 non-graph multimodal dataset across 6 graph-centric and modality-centric tasks show that TMTE consistently achieves state-of-the-art performance. Our code is available at https://anonymous.4open.science/r/TMTE-1873.

Paper Structure

This paper contains 25 sections, 4 theorems, 26 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

theorem 1

For a MAG with fused representation $\bar{\mathbf{H}}=\frac{1}{\mathcal{M}} \sum_{m\in\mathcal{M}}\mathbf{H}^{(m)}$ and a symmetrically normalized adjacency matrix of evolved topology $\mathbf{Q}^{E_1}=\lambda\,\tilde{\mathbf{A}} + (1-\lambda)\mathbf{A}^{E_1}$, the smooth fused representations can b

Figures (5)

  • Figure 1: Experimental results of our empirical study. We compare the proposed TMTE and MM-GCN (representative MGL baseline) on the Toys dataset. (a) Node classification performance under three topology settings. (b) G2Image performance under three topology settings. All results are presented as percentages.
  • Figure 2: Overview of TMTE, which jointly and evolutionarily optimizes the topology and modality toward the downstream task.
  • Figure 3: Experimental results of our robustness analysis. We investigate two types of topological noise, including noise interactions (i.e., randomly adding edges), which are presented in (a) and (b); and missing interactions (i.e., randomly removing edges), which are presented in (c) and (d).
  • Figure 4: Accuracy curves under the MVSA dataset.
  • Figure 5: Hyperparameter analysis for $\alpha$ and $T$ on four datasets and tasks.

Theorems & Definitions (4)

  • theorem 1: Smooth Fused Representations of MAG
  • theorem 2: Recursive Power Expansion of Smooth Fused Representations
  • theorem 3: Stability under inexact large-scale propagation
  • theorem 4: Contraction, uniqueness, and convergence rate on large-scale graphs