Table of Contents
Fetching ...

HyReaL: Clustering Attributed Graph via Hyper-Complex Space Representation Learning

Junyang Chen, Yang Lu, Mengke Li, Cuie Yang, Yiqun Zhang, Yiu-ming Cheung

TL;DR

HyReaL introduces a hyper-complex quaternion space for attributed graph clustering, using Four-View Projection to map arbitrary attributes into four quaternion views and Quaternion Graph Encoders to fuse them with graph structure. A clustering-oriented loss combines graph reconstruction, regularization, and a spectral clustering term, enabling universal embeddings that work across varying cluster counts $k$ without retraining. Empirical results on ten real-world datasets show HyReaL achieving superior clustering accuracy and separability, and ablations confirm the efficacy of FVP and QGE in mitigating Over-Smoothing and Over-Dominating effects. The approach offers practical benefits for real-world clustering tasks by delivering generalizable representations and improved efficiency when exploring multiple clustering granularities.

Abstract

Clustering complex data in the form of attributed graphs has attracted increasing attention, where powerful graph representation is a critical prerequisite. However, the well-known Over-Smoothing (OS) effect makes Graph Convolutional Networks tend to homogenize the representation of graph nodes, while the existing OS solutions focus on alleviating the homogeneity of nodes' embeddings from the aspect of graph topology information, which is inconsistent with the attributed graph clustering objective. Therefore, we introduce hyper-complex space with powerful quaternion feature transformation to enhance the representation learning of the attributes. A generalized \textbf{Hy}per-complex space \textbf{Re}present\textbf{a}tion \textbf{L}earning (\textbf{HyReaL}) model is designed to: 1) bridge arbitrary dimensional attributes to the well-developed quaternion algebra with four parts, and 2) connect the learned representations to more generalized clustering objective without being restricted to a given number of clusters $k$. The novel introduction of quaternion benefits attributed graph clustering from two aspects: 1) enhanced attribute coupling learning capability allows complex attribute information to be sufficiently exploited in clustering, and 2) stronger learning capability makes it unnecessary to stack too many graph convolution layers, naturally alleviating the OS problem. It turns out that the node representations learned by HyReaL are more discriminative and widely suit downstream clustering with different $k$s. Extensive experiments including significance tests, ablation studies, qualitative results, etc., show the superiority of HyReaL.

HyReaL: Clustering Attributed Graph via Hyper-Complex Space Representation Learning

TL;DR

HyReaL introduces a hyper-complex quaternion space for attributed graph clustering, using Four-View Projection to map arbitrary attributes into four quaternion views and Quaternion Graph Encoders to fuse them with graph structure. A clustering-oriented loss combines graph reconstruction, regularization, and a spectral clustering term, enabling universal embeddings that work across varying cluster counts without retraining. Empirical results on ten real-world datasets show HyReaL achieving superior clustering accuracy and separability, and ablations confirm the efficacy of FVP and QGE in mitigating Over-Smoothing and Over-Dominating effects. The approach offers practical benefits for real-world clustering tasks by delivering generalizable representations and improved efficiency when exploring multiple clustering granularities.

Abstract

Clustering complex data in the form of attributed graphs has attracted increasing attention, where powerful graph representation is a critical prerequisite. However, the well-known Over-Smoothing (OS) effect makes Graph Convolutional Networks tend to homogenize the representation of graph nodes, while the existing OS solutions focus on alleviating the homogeneity of nodes' embeddings from the aspect of graph topology information, which is inconsistent with the attributed graph clustering objective. Therefore, we introduce hyper-complex space with powerful quaternion feature transformation to enhance the representation learning of the attributes. A generalized \textbf{Hy}per-complex space \textbf{Re}present\textbf{a}tion \textbf{L}earning (\textbf{HyReaL}) model is designed to: 1) bridge arbitrary dimensional attributes to the well-developed quaternion algebra with four parts, and 2) connect the learned representations to more generalized clustering objective without being restricted to a given number of clusters . The novel introduction of quaternion benefits attributed graph clustering from two aspects: 1) enhanced attribute coupling learning capability allows complex attribute information to be sufficiently exploited in clustering, and 2) stronger learning capability makes it unnecessary to stack too many graph convolution layers, naturally alleviating the OS problem. It turns out that the node representations learned by HyReaL are more discriminative and widely suit downstream clustering with different s. Extensive experiments including significance tests, ablation studies, qualitative results, etc., show the superiority of HyReaL.

Paper Structure

This paper contains 46 sections, 42 equations, 13 figures, 4 tables, 1 algorithm.

Figures (13)

  • Figure 1: Vanilla graph encoders (upper) vs. quaternion graph encoders (lower). After the node information aggregation through several hops, nodes represented in real-value space $\mathbb{R}$ by vanilla graph encoders tend to be homogeneous due to the "over-smoothing" (OS) and "over-dominating" (OD) effects. By contrast, the four views of data are flexibly rotated in a hyper-complex space $\mathbb{H}$ by the quaternion graph encoders to facilitate representation learning with a higher degree of learning freedom.
  • Figure 2: Overview of HyReaL. Given attributed graph $G=\{\mathbf{A},\mathbf{X}\}$, the attributes $\mathbf{X}$ are first projected into four views to form a feature quaternion $\mathbf{F}=\mathbf{F}_r+\mathbf{F}_x\mathbf{i}+\mathbf{F}_y\mathbf{j}+\mathbf{F}_z\mathbf{k}$. Then $\mathbf{F}$ is encoded with the local graph structure by a quaternion graph convolutional module. The graph clustering-friendly embedding $\boldsymbol{\Gamma}$ learned according to the joint graph reconstruction Kullback-Leibler (KL) loss $\mathcal{L}_{kl}$ and the graph clustering loss $\mathcal{L}_{sc}$ is finally obtained, which is utilized for clustering.
  • Figure 3: The OS effect analysis across ten datasets via QGE with varying layer numbers. When the number of layers increases, the OS effect significantly impacts clustering performance, highlighting the effectiveness of our method in alleviating OS.
  • Figure 4: Clustering performance comparison using internal metrics under different $k$s. For the SC and CHI metrics, the higher the better. For the DBI metric, the lower the better.
  • Figure 5: Average execution time on different $k$ values. Different colors indicate the average execution time on different datasets. Deep and shallow colors indicate the execution time of model training and clustering.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Remark 1
  • Remark 2
  • Remark 1
  • Remark 2