Table of Contents
Fetching ...

A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation

Yiran Li, Gongyao Guo, Jieming Shi, Renchi Yang, Shiqi Shen, Qing Li, Jun Luo

TL;DR

ANCKA is developed as a versatile attributed network clustering framework, capable of attributed graph clustering, attributed multiplex graph clustering, and AHC, and is devise ANCKA-GPU with algorithmic designs tailored for GPU acceleration to boost efficiency.

Abstract

Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.

A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation

TL;DR

ANCKA is developed as a versatile attributed network clustering framework, capable of attributed graph clustering, attributed multiplex graph clustering, and AHC, and is devise ANCKA-GPU with algorithmic designs tailored for GPU acceleration to boost efficiency.

Abstract

Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
Paper Structure (28 sections, 2 theorems, 24 equations, 15 figures, 16 tables, 6 algorithms)

This paper contains 28 sections, 2 theorems, 24 equations, 15 figures, 16 tables, 6 algorithms.

Key Result

lemma thmcounterlemma

Let $\sigma_1\geq \sigma_2\geq\dots\geq\sigma_k$ be the $k$ largest singular values of matrix $\mathbf{S}\xspace$ in Eq. eq:approx-multi-hop. Given any matrix $\mathbf{W}\xspace\in \mathbb{R}^{n\times k }$ such that $h(\mathbf{W}\xspace)$ satisfies ${h(\mathbf{W}\xspace)}^\top \cdot h(\mathbf{W}\xsp

Figures (15)

  • Figure 1: An Example Attributed Hypergraph
  • Figure 2: AAS and RCC on Cora-CA (best viewed in color)
  • Figure 3: Overview of AHCKA
  • Figure 4: AAS and RCC on Citeseer-DG
  • Figure 5: AAS and RCC on ACM
  • ...and 10 more figures

Theorems & Definitions (4)

  • definition thmcounterdefinition
  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • definition thmcounterdefinition