Table of Contents
Fetching ...

Cross-Contrastive Clustering for Multimodal Attributed Graphs with Dual Graph Filtering

Haoran Zheng, Renchi Yang, Hongtao Wang, Jianliang Xu

TL;DR

This work tackles clustering on multimodal attributed graphs (MMAGs), where cross-modal attributes exhibit low correlation and high noise, hindering existing multi-view approaches. The authors introduce Dual Graph Filtering (DGF), which combines node-domain and feature-domain denoising to learn robust node representations, and a tri-cross contrastive training scheme that unions cross-modality, cross-neighborhood, and cross-community signals. Theoretical analyses frame DGF as simultaneous low-pass filtering in both the node and feature domains, supported by a Neumann-series approximation for scalable computation. Empirically, DGF achieves state-of-the-art clustering performance across eight real MMAG benchmarks, showing strong generalization and robustness, including on large graphs where competing methods fail due to memory constraints. The approach offers practical impact for MMAG analytics in domains like social networks, medical data, and e-commerce, where heterogeneous modalities are prevalent and noisy.

Abstract

Multimodal Attributed Graphs (MMAGs) are an expressive data model for representing the complex interconnections among entities that associate attributes from multiple data modalities (text, images, etc.). Clustering over such data finds numerous practical applications in real scenarios, including social community detection, medical data analytics, etc. However, as revealed by our empirical studies, existing multi-view clustering solutions largely rely on the high correlation between attributes across various views and overlook the unique characteristics (e.g., low modality-wise correlation and intense feature-wise noise) of multimodal attributes output by large pre-trained language and vision models in MMAGs, leading to suboptimal clustering performance. Inspired by foregoing empirical observations and our theoretical analyses with graph signal processing, we propose the Dual Graph Filtering (DGF) scheme, which innovatively incorporates a feature-wise denoising component into node representation learning, thereby effectively overcoming the limitations of traditional graph filters adopted in the extant multi-view graph clustering approaches. On top of that, DGF includes a tri-cross contrastive training strategy that employs instance-level contrastive learning across modalities, neighborhoods, and communities for learning robust and discriminative node representations. Our comprehensive experiments on eight benchmark MMAG datasets exhibit that DGF is able to outperform a wide range of state-of-the-art baselines consistently and significantly in terms of clustering quality measured against ground-truth labels.

Cross-Contrastive Clustering for Multimodal Attributed Graphs with Dual Graph Filtering

TL;DR

This work tackles clustering on multimodal attributed graphs (MMAGs), where cross-modal attributes exhibit low correlation and high noise, hindering existing multi-view approaches. The authors introduce Dual Graph Filtering (DGF), which combines node-domain and feature-domain denoising to learn robust node representations, and a tri-cross contrastive training scheme that unions cross-modality, cross-neighborhood, and cross-community signals. Theoretical analyses frame DGF as simultaneous low-pass filtering in both the node and feature domains, supported by a Neumann-series approximation for scalable computation. Empirically, DGF achieves state-of-the-art clustering performance across eight real MMAG benchmarks, showing strong generalization and robustness, including on large graphs where competing methods fail due to memory constraints. The approach offers practical impact for MMAG analytics in domains like social networks, medical data, and e-commerce, where heterogeneous modalities are prevalent and noisy.

Abstract

Multimodal Attributed Graphs (MMAGs) are an expressive data model for representing the complex interconnections among entities that associate attributes from multiple data modalities (text, images, etc.). Clustering over such data finds numerous practical applications in real scenarios, including social community detection, medical data analytics, etc. However, as revealed by our empirical studies, existing multi-view clustering solutions largely rely on the high correlation between attributes across various views and overlook the unique characteristics (e.g., low modality-wise correlation and intense feature-wise noise) of multimodal attributes output by large pre-trained language and vision models in MMAGs, leading to suboptimal clustering performance. Inspired by foregoing empirical observations and our theoretical analyses with graph signal processing, we propose the Dual Graph Filtering (DGF) scheme, which innovatively incorporates a feature-wise denoising component into node representation learning, thereby effectively overcoming the limitations of traditional graph filters adopted in the extant multi-view graph clustering approaches. On top of that, DGF includes a tri-cross contrastive training strategy that employs instance-level contrastive learning across modalities, neighborhoods, and communities for learning robust and discriminative node representations. Our comprehensive experiments on eight benchmark MMAG datasets exhibit that DGF is able to outperform a wide range of state-of-the-art baselines consistently and significantly in terms of clustering quality measured against ground-truth labels.

Paper Structure

This paper contains 30 sections, 5 theorems, 23 equations, 7 figures, 9 tables, 1 algorithm.

Key Result

theorem 1

Let $\mathbf{M}\xspace$ be a matrix whose dominant eigenvalue $\lambda(\mathbf{M}\xspace)$ satisfies $\lambda(\mathbf{M}\xspace)<1$. Then, the inverse $(\mathbf{I}\xspace-\mathbf{M}\xspace)^{-1}$ can be expanded as a Neumann series: $(\mathbf{I}\xspace-\mathbf{M}\xspace)^{-1}=\sum_{\ell=0}^\infty\ma

Figures (7)

  • Figure 1: Performance comparison of the state-of-the-art AGC, MVAGC, and MVC methods (i.e., S3GCdevvrit2022s3gc, LMGECfettal2023simultaneous, EMVGC-LGwen2023efficient) on real MMAGs Moviesyan2024graph and Toysyan2024graph.
  • Figure 2: Overview of DGF.
  • Figure 3: Illustration of the Tri-Cross Contrastive Training.
  • Figure 4: Parameter analysis of $\alpha$ and $\beta$.
  • Figure 5: Parameter analysis of $T$.
  • ...and 2 more figures

Theorems & Definitions (8)

  • definition 1: Graph Signal Denoising (GSD) ma2021unifieddong2016learning
  • definition 2: Attribute Distance Correlation
  • definition 3: Outlierness via Z-score heckert2003nistiglewicz1993volume
  • theorem 1: horn2012matrix
  • lemma 1
  • lemma 2
  • theorem 2
  • theorem 3