Table of Contents
Fetching ...

Attributed Graph Clustering in Collaborative Settings

Rui Zhang, Xiaoyang Hou, Zhihua Tian, Yan he, Enchao Gong, Jian Liu, Qingbiao Wu, Kui Ren

TL;DR

This work tackles unsupervised graph clustering when node attributes are partitioned across collaborators (vertical setting) and data privacy must be preserved. It introduces kCAGC, a graph-filtering–based framework that reduces communication by leveraging local clustering intersections to form a small set of virtual nodes, enabling secure aggregation to produce globally coherent clusters. The authors provide a theoretical proximity-based correctness guarantee under a restricted proximity condition and demonstrate that kCAGC can achieve accuracy comparable to centralized methods on four public datasets, while considerably reducing communication costs. Empirical results show favorable utility and practical efficiency both in LAN and WAN scenarios, with a comprehensive security analysis showing low leakage under honest-but-curious models. The approach offers a principled, privacy-preserving solution for collaborative graph clustering in vertically partitioned settings with scalable communication and robust performance.

Abstract

Graph clustering is an unsupervised machine learning method that partitions the nodes in a graph into different groups. Despite achieving significant progress in exploiting both attributed and structured data information, graph clustering methods often face practical challenges related to data isolation. Moreover, the absence of collaborative methods for graph clustering limits their effectiveness. In this paper, we propose a collaborative graph clustering framework for attributed graphs, supporting attributed graph clustering over vertically partitioned data with different participants holding distinct features of the same data. Our method leverages a novel technique that reduces the sample space, improving the efficiency of the attributed graph clustering method. Furthermore, we compare our method to its centralized counterpart under a proximity condition, demonstrating that the successful local results of each participant contribute to the overall success of the collaboration. We fully implement our approach and evaluate its utility and efficiency by conducting experiments on four public datasets. The results demonstrate that our method achieves comparable accuracy levels to centralized attributed graph clustering methods. Our collaborative graph clustering framework provides an efficient and effective solution for graph clustering challenges related to data isolation.

Attributed Graph Clustering in Collaborative Settings

TL;DR

This work tackles unsupervised graph clustering when node attributes are partitioned across collaborators (vertical setting) and data privacy must be preserved. It introduces kCAGC, a graph-filtering–based framework that reduces communication by leveraging local clustering intersections to form a small set of virtual nodes, enabling secure aggregation to produce globally coherent clusters. The authors provide a theoretical proximity-based correctness guarantee under a restricted proximity condition and demonstrate that kCAGC can achieve accuracy comparable to centralized methods on four public datasets, while considerably reducing communication costs. Empirical results show favorable utility and practical efficiency both in LAN and WAN scenarios, with a comprehensive security analysis showing low leakage under honest-but-curious models. The approach offers a principled, privacy-preserving solution for collaborative graph clustering in vertically partitioned settings with scalable communication and robust performance.

Abstract

Graph clustering is an unsupervised machine learning method that partitions the nodes in a graph into different groups. Despite achieving significant progress in exploiting both attributed and structured data information, graph clustering methods often face practical challenges related to data isolation. Moreover, the absence of collaborative methods for graph clustering limits their effectiveness. In this paper, we propose a collaborative graph clustering framework for attributed graphs, supporting attributed graph clustering over vertically partitioned data with different participants holding distinct features of the same data. Our method leverages a novel technique that reduces the sample space, improving the efficiency of the attributed graph clustering method. Furthermore, we compare our method to its centralized counterpart under a proximity condition, demonstrating that the successful local results of each participant contribute to the overall success of the collaboration. We fully implement our approach and evaluate its utility and efficiency by conducting experiments on four public datasets. The results demonstrate that our method achieves comparable accuracy levels to centralized attributed graph clustering methods. Our collaborative graph clustering framework provides an efficient and effective solution for graph clustering challenges related to data isolation.

Paper Structure

This paper contains 27 sections, 4 theorems, 24 equations, 5 figures, 10 tables, 4 algorithms.

Key Result

Theorem 1

Let $\mathcal{T}=\{T_1, T_2, \cdots, T_k\}$ be the set of target clusters of the nodes ${\mathbf{X}}$. For participant $\mathcal{P}_l,1\leq l\leq L$, let $\mathcal{T}^l=\{T^l_1, T^l_2, \cdots, T^l_{\hat{k}}\}$ be the set of target clusters of local nodes ${\mathbf{X}}^l$. Assume that each pair of cl

Figures (5)

  • Figure 1: An example of collaborative setting. Users sign in to Twitter and TikTok using their Facebook IDs. A user who wants to buy a car finds some text posts about cars on Twitter and some videos about cars on TikTok. With the shared social networks, Twitter and TikTok have access to information about cars purchased by the user's neighbors. Then they can provide better recommendations via the collaboration of text information and video information.
  • Figure 2: The visualization of the intuition. (a): $\mathbb{R}^2$ space separated by 1-d hyperplane. The red $\times$ are the cluster centers. (b): Local nodes are on the axes. $\mathbb{R}^2$ space is separated into $5^2$ subspaces for 5 local clusters in each dimension. Grey nodes are the centers of the subspace, darker grey nodes mean there are more nodes in this subspace. (c): This figure shows the comparisons between cluster results of the subspace (grey $\square$) and the centralized results (red $\times$)
  • Figure 3: The t-SNE projection of the feature for "Cora" dataset according to the real (a) labels, (b) AGC, (c) Protocol \ref{['alg:collaborative_basic']}, and (d) $kCAGC$. We choose L=2 for protocol\ref{['alg:collaborative_basic']} and $kCAGC$. And for $kCAGC$ we use $\hat{k} = k$. Different colors in each sub-figure mean different clusters.
  • Figure 4: Accuracy of $kCAGC$ while increasing the proportion of shared graph data for "Cora" Dataset
  • Figure 5: Different order of graph filter ($\psi$) for $kCAGC$

Theorems & Definitions (7)

  • Definition 1
  • Theorem 1
  • Definition 2
  • Lemma 1
  • proof
  • Theorem 2: Theorem 6.2 in bonawitz2017practical
  • Theorem 3: Theorem 6.3 in bonawitz2017practical