Table of Contents
Fetching ...

A Unified Framework for Exploratory Learning-Aided Community Detection Under Topological Uncertainty

Yu Hou, Cong Tran, Ming Li, Won-Yong Shin

TL;DR

This work tackles overlapping community detection when the true network topology is unknown, introducing META-CODE, a unified framework that iteratively combines community-affiliation embedding via a reconstruction-trained GNN, exploration of the hidden network through strategically selected node queries, and network inference using an edge-connectivity Siamese model. The approach jointly optimizes a community-affiliation matrix and a sequence of queries, leveraging node metadata and progressively revealed edges to refine edge predictions and communities. Theoretical results demonstrate that querying nodes in overlapping regions accelerates exploration and that META-CODE scales linearly with the number of edges, while extensive experiments on real networks show substantial gains (up to 65.55% NMI) over competitive methods and strong evidence for the contribution of each module. Overall, META-CODE offers a practical, scalable solution for uncovering meaningful, overlapping communities under topological uncertainty with broad applicability in privacy-constrained or incomplete-network contexts.

Abstract

In social networks, the discovery of community structures has received considerable attention as a fundamental problem in various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often uncertain, thereby rendering established community detection approaches ineffective without costly network topology acquisition. To tackle this challenge, we present META-CODE, a unified framework for detecting overlapping communities via exploratory learning aided by easy-to-collect node metadata when networks are topologically unknown (or only partially known). Specifically, META-CODE consists of three iterative steps in addition to the initial network inference step: 1) node-level community-affiliation embeddings based on graph neural networks (GNNs) trained by our new reconstruction loss, 2) network exploration via community-affiliation-based node queries, and 3) network inference using an edge connectivity-based Siamese neural network model from the explored network. Through extensive experiments on three real-world datasets including two large networks, we demonstrate: (a) the superiority of META-CODE over benchmark community detection methods, achieving remarkable gains up to 65.55% on the Facebook dataset over the best competitor among our selected competitive methods in terms of normalized mutual information (NMI), (b) the impact of each module in META-CODE, (c) the effectiveness of node queries in META-CODE based on empirical evaluations and theoretical findings, and (d) the convergence of the inferred network.

A Unified Framework for Exploratory Learning-Aided Community Detection Under Topological Uncertainty

TL;DR

This work tackles overlapping community detection when the true network topology is unknown, introducing META-CODE, a unified framework that iteratively combines community-affiliation embedding via a reconstruction-trained GNN, exploration of the hidden network through strategically selected node queries, and network inference using an edge-connectivity Siamese model. The approach jointly optimizes a community-affiliation matrix and a sequence of queries, leveraging node metadata and progressively revealed edges to refine edge predictions and communities. Theoretical results demonstrate that querying nodes in overlapping regions accelerates exploration and that META-CODE scales linearly with the number of edges, while extensive experiments on real networks show substantial gains (up to 65.55% NMI) over competitive methods and strong evidence for the contribution of each module. Overall, META-CODE offers a practical, scalable solution for uncovering meaningful, overlapping communities under topological uncertainty with broad applicability in privacy-constrained or incomplete-network contexts.

Abstract

In social networks, the discovery of community structures has received considerable attention as a fundamental problem in various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often uncertain, thereby rendering established community detection approaches ineffective without costly network topology acquisition. To tackle this challenge, we present META-CODE, a unified framework for detecting overlapping communities via exploratory learning aided by easy-to-collect node metadata when networks are topologically unknown (or only partially known). Specifically, META-CODE consists of three iterative steps in addition to the initial network inference step: 1) node-level community-affiliation embeddings based on graph neural networks (GNNs) trained by our new reconstruction loss, 2) network exploration via community-affiliation-based node queries, and 3) network inference using an edge connectivity-based Siamese neural network model from the explored network. Through extensive experiments on three real-world datasets including two large networks, we demonstrate: (a) the superiority of META-CODE over benchmark community detection methods, achieving remarkable gains up to 65.55% on the Facebook dataset over the best competitor among our selected competitive methods in terms of normalized mutual information (NMI), (b) the impact of each module in META-CODE, (c) the effectiveness of node queries in META-CODE based on empirical evaluations and theoretical findings, and (d) the convergence of the inferred network.
Paper Structure (39 sections, 3 theorems, 18 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 39 sections, 3 theorems, 18 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

For any nodes $u$ and $v$ belonging to $M$ and $M'$ communities, respectively, in an underlying network $\mathcal{G}=(\mathcal{V},\mathcal{E})$, where $M > M'$, if $\varepsilon \le \frac{{N_{\min } - 1}}{K} - 1$, then the following inequality holds: where $\mathbb{E}_{u } \left[ {\mathcal{D}_M } \right]$ and $\mathbb{E}_{v } \left[ {\mathcal{D}_{M'} } \right]$ are the expectations of degree dist

Figures (9)

  • Figure 1: The schematic overview of META-CODE consisting of three iterative steps: 1) community-affiliation embedding, generated by GNNs to capture both structure–community and metadata–community relationships; 2) network exploration via node queries, which are selected within areas of overlapping communities and distributed across diverse communities; and 3) network inference, which builds potential edges between nodes based on connectivity information from explored edges. Here, the first and second iterations are executed.
  • Figure 2: An example illustrating how network structure $\mathcal{G}$ and node metadata $\mathcal{X}$ can be reconstructed from the community-affiliation embedding matrix ${\bf F}$.
  • Figure 3: Network exploration with different strategies for query node selection when the underlying true network has three overlapping communities. (a) Selection of query nodes with the highest degree. (b) Selection of query nodes in non-overlapping regions. (c) Selection of query nodes that belong to multiple communities and are distributed across diverse communities.
  • Figure 4: The architecture of our EC-SiamNet for network inference.
  • Figure 5: The number of explored nodes, $N_\text{ex}$, according to different percentages (%) of queried nodes among $N$ nodes on Facebook and Engineering.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof