Table of Contents
Fetching ...

Top-L Most Influential Community Detection Over Social Networks (Technical Report)

Nan Zhang, Yutong Ye, Xiang Lian, Mingsong Chen

TL;DR

This paper proposes a novel problem, named Top-L most Influential Community DEtection over social networks, which aims to retrieve top-$L$ seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords.

Abstract

In many real-world applications such as social network analysis and online marketing/advertising, the community detection is a fundamental task to identify communities (subgraphs) in social networks with high structural cohesiveness. While previous works focus on detecting communities alone, they do not consider the collective influences of users in these communities on other user nodes in social networks. Inspired by this, in this paper, we investigate the influence propagation from some seed communities and their influential effects that result in the influenced communities. We propose a novel problem, named Top-L most Influential Community DEtection (TopL-ICDE) over social networks, which aims to retrieve top-L seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords. In order to efficiently tackle the TopL-ICDE problem, we design effective pruning strategies to filter out false alarms of seed communities and propose an effective index mechanism to facilitate efficient Top-L community retrieval. We develop an efficient TopL-ICDE answering algorithm by traversing the index and applying our proposed pruning strategies. We also formulate and tackle a variant of TopL-ICDE, named diversified top-L most influential community detection (DTopL-ICDE), which returns a set of L diversified communities with the highest diversity score (i.e., collaborative influences by L communities). We prove that DTopL-ICDE is NP-hard, and propose an efficient greedy algorithm with our designed diversity score pruning. Through extensive experiments, we verify the efficiency and effectiveness of our proposed TopL-ICDE and DTopL-ICDE approaches over real/synthetic social networks under various parameter settings.

Top-L Most Influential Community Detection Over Social Networks (Technical Report)

TL;DR

This paper proposes a novel problem, named Top-L most Influential Community DEtection over social networks, which aims to retrieve top- seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords.

Abstract

In many real-world applications such as social network analysis and online marketing/advertising, the community detection is a fundamental task to identify communities (subgraphs) in social networks with high structural cohesiveness. While previous works focus on detecting communities alone, they do not consider the collective influences of users in these communities on other user nodes in social networks. Inspired by this, in this paper, we investigate the influence propagation from some seed communities and their influential effects that result in the influenced communities. We propose a novel problem, named Top-L most Influential Community DEtection (TopL-ICDE) over social networks, which aims to retrieve top-L seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords. In order to efficiently tackle the TopL-ICDE problem, we design effective pruning strategies to filter out false alarms of seed communities and propose an effective index mechanism to facilitate efficient Top-L community retrieval. We develop an efficient TopL-ICDE answering algorithm by traversing the index and applying our proposed pruning strategies. We also formulate and tackle a variant of TopL-ICDE, named diversified top-L most influential community detection (DTopL-ICDE), which returns a set of L diversified communities with the highest diversity score (i.e., collaborative influences by L communities). We prove that DTopL-ICDE is NP-hard, and propose an efficient greedy algorithm with our designed diversity score pruning. Through extensive experiments, we verify the efficiency and effectiveness of our proposed TopL-ICDE and DTopL-ICDE approaches over real/synthetic social networks under various parameter settings.
Paper Structure (26 sections, 10 theorems, 6 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 26 sections, 10 theorems, 6 equations, 6 figures, 3 tables, 4 algorithms.

Key Result

Lemma 1

(Keyword Pruning) Given a set, $Q$, of query keywords and a candidate subgraph $g$, subgraph $g$ can be safely pruned, if there exists at least one vertex $v_i \in V(g)$ such that: $v_i.W \cap Q = \emptyset$ holds, where $v_i.W$ is the keyword set associated with vertex $v_i$.

Figures (6)

  • Figure 1: An example of the Top$L$-ICDE problem over social network.
  • Figure 2: The Top$L$-ICDE performance on real/synthetic graph data.
  • Figure 3: The robustness evaluation of the Top$L$-ICDE performance.
  • Figure 4: The ablation study of the Top$L$-ICDE performance.
  • Figure 5: The influenced communities from Top$L$-ICDE vs. $k$-core ($k=4$).
  • ...and 1 more figures

Theorems & Definitions (16)

  • Example 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • ...and 6 more