Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation
Lian Shen, Zhendan Chen, Yinhui jiang, Meijia Song, Ziming Su, Juan Liu, Xiangrong Liu
TL;DR
This work tackles the scalability challenge of training GNNs on large multimodal graphs by introducing Structurally-Regularized Gradient Matching (SR-GM) for graph condensation. SR-GM combines gradient decoupling—via orthogonal projection to resolve inter-modal conflicts—and a structural damping regularizer based on the Dirichlet energy of the gradient field to curb noise amplification through graph propagation. The approach is validated on multiple multimodal datasets, showing superior accuracy, faster convergence, and strong cross-architecture generalization compared to baselines and across unaligned feature settings. The method demonstrates robustness to multimodal feature misalignment and provides a scalable data-centric path for multimodal graph learning in resource-constrained environments. This work promises practical gains for applications like Neural Architecture Search and beyond, by producing compact, architecture-agnostic synthetic graphs without sacrificing performance.
Abstract
In critical web applications such as e-commerce and recommendation systems, multimodal graphs integrating rich visual and textual attributes are increasingly central, yet their large scale introduces substantial computational burdens for training Graph Neural Networks (GNNs). While Graph Condensation (GC) offers a promising solution by synthesizing smaller datasets, existing methods falter in the multimodal setting. We identify a dual challenge causing this failure: (1) conflicting gradients arising from semantic misalignments between modalities, and (2) the GNN's message-passing architecture pathologically amplifying this gradient noise across the graph structure. To address this, we propose Structurally-Regularized Gradient Matching (SR-GM), a novel condensation framework tailored for multimodal graphs. SR-GM introduces two synergistic components: first, a gradient decoupling mechanism that resolves inter-modality conflicts at their source via orthogonal projection; and second, a structural damping regularizer that acts directly on the gradient field. By leveraging the graph's Dirichlet energy, this regularizer transforms the topology from a noise amplifier into a stabilizing force during optimization. Extensive experiments demonstrate that SR-GM significantly improves accuracy and accelerates convergence compared to baseline methods. Ablation studies confirm that addressing both gradient conflict and structural amplification in tandem is essential for achieving superior performance. Moreover, the condensed multimodal graphs exhibit strong cross-architecture generalization and promise to accelerate applications like Neural Architecture Search. This research provides a scalable methodology for multimodal graph-based learning in resource-constrained environments.
