Table of Contents
Fetching ...

SiMilarity-Enhanced Homophily for Multi-View Heterophilous Graph Clustering

Jianpeng Chen, Yawen Ling, Yazhou Ren, Zichen Wen, Tianyi Wu, Shufei Zhang, Lifang He

TL;DR

A novel SiMilarity-enhanced Homophily for Multi-view Heterophilous Graph Clustering (SMHGC) approach is proposed to enhance the homophily by introducing three similarity terms, i.e., neighbor pattern similarity, node feature similarity, and multi-view global similarity, in a label-free manner.

Abstract

With the increasing prevalence of graph-structured data, multi-view graph clustering has been widely used in various downstream applications. Existing approaches primarily rely on a unified message passing mechanism, which significantly enhances clustering performance. Nevertheless, this mechanism limits its applicability to heterophilous situations, as it is fundamentally predicated on the assumption of homophily, i.e., the connected nodes often belong to the same class. In reality, this assumption does not always hold; a moderately or even mildly homophilous graph is more common than a fully homophilous one due to inevitable heterophilous information in the graph. To address this issue, in this paper, we propose a novel SiMilarity-enhanced Homophily for Multi-view Heterophilous Graph Clustering (SMHGC) approach. By analyzing the relationship between similarity and graph homophily, we propose to enhance the homophily by introducing three similarity terms, i.e., neighbor pattern similarity, node feature similarity, and multi-view global similarity, in a label-free manner. Then, a consensus-based inter- and intra-view fusion paradigm is proposed to fuse the improved homophilous graph from different views and utilize them for clustering. The state-of-the-art experimental results on both multi-view heterophilous and homophilous datasets collectively demonstrate the strong capacity of similarity for unsupervised multi-view heterophilous graph learning. Additionally, the consistent performance across semi-synthetic datasets with varying levels of homophily serves as further evidence of SMHGC's resilience to heterophily.

SiMilarity-Enhanced Homophily for Multi-View Heterophilous Graph Clustering

TL;DR

A novel SiMilarity-enhanced Homophily for Multi-view Heterophilous Graph Clustering (SMHGC) approach is proposed to enhance the homophily by introducing three similarity terms, i.e., neighbor pattern similarity, node feature similarity, and multi-view global similarity, in a label-free manner.

Abstract

With the increasing prevalence of graph-structured data, multi-view graph clustering has been widely used in various downstream applications. Existing approaches primarily rely on a unified message passing mechanism, which significantly enhances clustering performance. Nevertheless, this mechanism limits its applicability to heterophilous situations, as it is fundamentally predicated on the assumption of homophily, i.e., the connected nodes often belong to the same class. In reality, this assumption does not always hold; a moderately or even mildly homophilous graph is more common than a fully homophilous one due to inevitable heterophilous information in the graph. To address this issue, in this paper, we propose a novel SiMilarity-enhanced Homophily for Multi-view Heterophilous Graph Clustering (SMHGC) approach. By analyzing the relationship between similarity and graph homophily, we propose to enhance the homophily by introducing three similarity terms, i.e., neighbor pattern similarity, node feature similarity, and multi-view global similarity, in a label-free manner. Then, a consensus-based inter- and intra-view fusion paradigm is proposed to fuse the improved homophilous graph from different views and utilize them for clustering. The state-of-the-art experimental results on both multi-view heterophilous and homophilous datasets collectively demonstrate the strong capacity of similarity for unsupervised multi-view heterophilous graph learning. Additionally, the consistent performance across semi-synthetic datasets with varying levels of homophily serves as further evidence of SMHGC's resilience to heterophily.
Paper Structure (40 sections, 3 theorems, 23 equations, 6 figures, 3 tables)

This paper contains 40 sections, 3 theorems, 23 equations, 6 figures, 3 tables.

Key Result

Proposition 1

In heterophilous graphs, if the neighborhood distribution of nodes with the same label is (approximately) sampled from a similar distribution and different labels have distinguishable distributions, then this heterophilous graph indicates good heteropihly.

Figures (6)

  • Figure 1: (a) Observation 1: Clustering performance decrease with the increase of heteropihlous ratio.(b) Observation 2: On heterophilous graph (Texas and Chameleon), homophilous ratio of neighbor pattern similarity (Definition \ref{['def:neiborsim']}) and feature similarity could be better than the original adjacent.
  • Figure 2: The proposed framework of SMHGC. It takes $V$ feature matrices ($\{\mathbf{X}^v\}^V_{v=1}$) and corresponding $V$ graphs ($\{\mathbf{A}^v\}^V_{v=1}$) as inputs. Then, homophilous information is extracted and optimized under the regularization of similarity terms ($\{\mathcal{L}^v_{sim}\}^V_{v=1}$). The homophilous graphs ($\{\mathbf{S}^v\}^V_{v=1}$) are then generated by infusing the homophilous information from both neighbor similarity matrix ($\mathbf{A}^v_a$) and feature similarity matrix ($\mathbf{A}^v_x$). Subsequently, the intra- and inter-view fusion module aggregates and fuses feature and homophilous information together to finally output a comprehensive embedding $\overline{\mathbf{H}}$.
  • Figure 3: Left of Y-axis is the clustering performance (NMI). X-axis denotes synthesis datasets with increased 'good heterophily' constructed from ACM. To construct the synthesis dataset, the homophilous ratio of the original graph is kept low (as the gray dotted line shows), instead, we gradually increase 'good heterophily' (HR of neighbor patterns) information. The original graph and sim-enhanced graph are fed into a parameter-free message passing layer to get aggregated node embedding, and $K$-means is conducted to obtain clusters evaluated by NMI.
  • Figure 4: Clustering results on six semi-synthetic ACM datasets with different heterophilous ratios (Fig. \ref{['fig:resultsheternmi']} and Fig. \ref{['fig:resultsheteracc']}), and parameter sensitive analysis w.r.t.$order$, $k$ (Fig. \ref{['fig:senodk']}), $\gamma_{sim}$ and $\gamma_r$ (Fig. \ref{['fig:senheat']}). The whole results can be found in Appendix.
  • Figure 5: Clustering results on six semi-synthetic ACM datasets with different heterophilous ratios.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1: Homophily and heterophily
  • Definition 2: Generalized edge
  • Definition 3: Homophilous information
  • Proposition 1: Good heterophily ma2022is
  • Definition 4: Neighbor pattern similarity
  • Theorem 1
  • Theorem 2
  • proof