Table of Contents
Fetching ...

Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation

Lian Shen, Zhendan Chen, Yinhui jiang, Meijia Song, Ziming Su, Juan Liu, Xiangrong Liu

TL;DR

This work tackles the scalability challenge of training GNNs on large multimodal graphs by introducing Structurally-Regularized Gradient Matching (SR-GM) for graph condensation. SR-GM combines gradient decoupling—via orthogonal projection to resolve inter-modal conflicts—and a structural damping regularizer based on the Dirichlet energy of the gradient field to curb noise amplification through graph propagation. The approach is validated on multiple multimodal datasets, showing superior accuracy, faster convergence, and strong cross-architecture generalization compared to baselines and across unaligned feature settings. The method demonstrates robustness to multimodal feature misalignment and provides a scalable data-centric path for multimodal graph learning in resource-constrained environments. This work promises practical gains for applications like Neural Architecture Search and beyond, by producing compact, architecture-agnostic synthetic graphs without sacrificing performance.

Abstract

In critical web applications such as e-commerce and recommendation systems, multimodal graphs integrating rich visual and textual attributes are increasingly central, yet their large scale introduces substantial computational burdens for training Graph Neural Networks (GNNs). While Graph Condensation (GC) offers a promising solution by synthesizing smaller datasets, existing methods falter in the multimodal setting. We identify a dual challenge causing this failure: (1) conflicting gradients arising from semantic misalignments between modalities, and (2) the GNN's message-passing architecture pathologically amplifying this gradient noise across the graph structure. To address this, we propose Structurally-Regularized Gradient Matching (SR-GM), a novel condensation framework tailored for multimodal graphs. SR-GM introduces two synergistic components: first, a gradient decoupling mechanism that resolves inter-modality conflicts at their source via orthogonal projection; and second, a structural damping regularizer that acts directly on the gradient field. By leveraging the graph's Dirichlet energy, this regularizer transforms the topology from a noise amplifier into a stabilizing force during optimization. Extensive experiments demonstrate that SR-GM significantly improves accuracy and accelerates convergence compared to baseline methods. Ablation studies confirm that addressing both gradient conflict and structural amplification in tandem is essential for achieving superior performance. Moreover, the condensed multimodal graphs exhibit strong cross-architecture generalization and promise to accelerate applications like Neural Architecture Search. This research provides a scalable methodology for multimodal graph-based learning in resource-constrained environments.

Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation

TL;DR

This work tackles the scalability challenge of training GNNs on large multimodal graphs by introducing Structurally-Regularized Gradient Matching (SR-GM) for graph condensation. SR-GM combines gradient decoupling—via orthogonal projection to resolve inter-modal conflicts—and a structural damping regularizer based on the Dirichlet energy of the gradient field to curb noise amplification through graph propagation. The approach is validated on multiple multimodal datasets, showing superior accuracy, faster convergence, and strong cross-architecture generalization compared to baselines and across unaligned feature settings. The method demonstrates robustness to multimodal feature misalignment and provides a scalable data-centric path for multimodal graph learning in resource-constrained environments. This work promises practical gains for applications like Neural Architecture Search and beyond, by producing compact, architecture-agnostic synthetic graphs without sacrificing performance.

Abstract

In critical web applications such as e-commerce and recommendation systems, multimodal graphs integrating rich visual and textual attributes are increasingly central, yet their large scale introduces substantial computational burdens for training Graph Neural Networks (GNNs). While Graph Condensation (GC) offers a promising solution by synthesizing smaller datasets, existing methods falter in the multimodal setting. We identify a dual challenge causing this failure: (1) conflicting gradients arising from semantic misalignments between modalities, and (2) the GNN's message-passing architecture pathologically amplifying this gradient noise across the graph structure. To address this, we propose Structurally-Regularized Gradient Matching (SR-GM), a novel condensation framework tailored for multimodal graphs. SR-GM introduces two synergistic components: first, a gradient decoupling mechanism that resolves inter-modality conflicts at their source via orthogonal projection; and second, a structural damping regularizer that acts directly on the gradient field. By leveraging the graph's Dirichlet energy, this regularizer transforms the topology from a noise amplifier into a stabilizing force during optimization. Extensive experiments demonstrate that SR-GM significantly improves accuracy and accelerates convergence compared to baseline methods. Ablation studies confirm that addressing both gradient conflict and structural amplification in tandem is essential for achieving superior performance. Moreover, the condensed multimodal graphs exhibit strong cross-architecture generalization and promise to accelerate applications like Neural Architecture Search. This research provides a scalable methodology for multimodal graph-based learning in resource-constrained environments.

Paper Structure

This paper contains 29 sections, 11 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: This is the overall framework of SR-GM. $\mathcal{T}$ and $\mathcal{S}$ denote the target and condensed multimodal graph datasets, respectively. The GNN is a graph neural network parameterized by $\theta$. $\ell$ and $\ell_{gm}$ represent the task loss and the gradient matching loss function, respectively.$\nabla_{\theta} \ell_{S} (\theta_t )$ denotes the gradient with respect to the model parameters $\theta$, while $\nabla_{X_S} \ell_{S} (\theta_t )$ denotes the gradient with respect to the synthetic node features $X_s$. $g_{\Phi}$ is an edge generator implemented by an MLP network, and the synthetic adjacency matrix $A_s$ is generated by $g_{\Phi}$ taking $X_s$ as input. $Z$ and $\hat{Y}$ represent the output features and predicted labels of the network, respectively.
  • Figure 2: Comparison of condensation efficiency between our proposed SR-GM and the baseline GCond on the Ele-fashion and Goodreads-NC datasets. We can observe that SR-GM achieves comparable or superior peak performance to GCond with a faster convergence speed.
  • Figure 3: Test accuracy (%) for SR-GM and GCond using different input features: text-only, image-only, and fused text-image features on Ele-fashion and Goodreads-NC.
  • Figure 4: Heatmap of test accuracy for SR-GM on the Ele-fashion dataset under different settings of condensation rate ($r$) and regularization weight ($\lambda$). Darker shades correspond to higher accuracy, indicating better performance.
  • Figure 5: Average performance comparison between feature aligned and feature misaligned