Table of Contents
Fetching ...

Q-DISCO: Query-Centric Densest Subgraphs in Networks with Opinion Information

Tianyi Chen, Atsushi Miyauchi, Charalampos E. Tsourakakis

TL;DR

Q-DISCO addresses the problem of finding densely connected subgraphs whose node opinions align with a given query vector, formalized as maximizing density with a lower-bound average agreement constraint. The authors prove NP-hardness and limited approximation guarantees, and propose two principled heuristics: Q-Lagrange, based on Lagrangian relaxation, and Q-Peeling, a dual LP-inspired greedy peeling method. Through extensive experiments on Twitter, DBLP, and Deezer data, the methods demonstrate strong performance in identifying meaningful, opinion-aligned dense communities and show favorable scalability compared to LP-based baselines. The work provides practical tools for analyzing opinion dynamics and cohesive substructures in opinion-rich networks, with potential applications in recommender systems and social science research.

Abstract

Given a network $G=(V,E)$, where each node $v$ is associated with a vector $\boldsymbol{p}_v \in \mathbb{R}^d$ representing its opinion about $d$ different topics, how can we uncover subsets of nodes that not only exhibit exceptionally high density but also possess positively aligned opinions on multiple topics? In this paper we focus on this novel algorithmic question, that is essential in an era where digital social networks are hotbeds of opinion formation and dissemination. We introduce a novel methodology anchored in the well-established densest subgraph problem. We analyze the computational complexity of our formulation, indicating that our problem is NP-hard and eludes practically acceptable approximation guarantees. To navigate these challenges, we design two heuristic algorithms: the first is predicated on the Lagrangian relaxation of our formulation, while the second adopts a peeling algorithm based on the dual of a Linear Programming relaxation. We elucidate the theoretical underpinnings of their performance and validate their utility through empirical evaluation on real-world datasets. Among others, we delve into Twitter datasets we collected concerning timely issues, such as the Ukraine conflict and the discourse surrounding COVID-19 mRNA vaccines, to gauge the effectiveness of our methodology. Our empirical investigations verify that our algorithms are able to extract valuable insights from networks with opinion information.

Q-DISCO: Query-Centric Densest Subgraphs in Networks with Opinion Information

TL;DR

Q-DISCO addresses the problem of finding densely connected subgraphs whose node opinions align with a given query vector, formalized as maximizing density with a lower-bound average agreement constraint. The authors prove NP-hardness and limited approximation guarantees, and propose two principled heuristics: Q-Lagrange, based on Lagrangian relaxation, and Q-Peeling, a dual LP-inspired greedy peeling method. Through extensive experiments on Twitter, DBLP, and Deezer data, the methods demonstrate strong performance in identifying meaningful, opinion-aligned dense communities and show favorable scalability compared to LP-based baselines. The work provides practical tools for analyzing opinion dynamics and cohesive substructures in opinion-rich networks, with potential applications in recommender systems and social science research.

Abstract

Given a network , where each node is associated with a vector representing its opinion about different topics, how can we uncover subsets of nodes that not only exhibit exceptionally high density but also possess positively aligned opinions on multiple topics? In this paper we focus on this novel algorithmic question, that is essential in an era where digital social networks are hotbeds of opinion formation and dissemination. We introduce a novel methodology anchored in the well-established densest subgraph problem. We analyze the computational complexity of our formulation, indicating that our problem is NP-hard and eludes practically acceptable approximation guarantees. To navigate these challenges, we design two heuristic algorithms: the first is predicated on the Lagrangian relaxation of our formulation, while the second adopts a peeling algorithm based on the dual of a Linear Programming relaxation. We elucidate the theoretical underpinnings of their performance and validate their utility through empirical evaluation on real-world datasets. Among others, we delve into Twitter datasets we collected concerning timely issues, such as the Ukraine conflict and the discourse surrounding COVID-19 mRNA vaccines, to gauge the effectiveness of our methodology. Our empirical investigations verify that our algorithms are able to extract valuable insights from networks with opinion information.

Paper Structure

This paper contains 22 sections, 5 theorems, 10 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

proposition 1

Q-DISCO is NP-hard, even for the instances in which $c_v=0$ or $1$ for every $v\in V$.

Figures (5)

  • Figure 1: (a) Demonstration of Q-Lagrange and (b) a bad instance. In the graph in (b), the agreements of nodes $a$, $b$, and $c$ are $1$, $-0.5$, and $-0.5$, respectively. The agreements of nodes in $K_4$ are all $-1$. Let $\theta=0$. The linear relations between $\lambda$ and $H_\lambda(S)$ of some representative subsets are shown in (c). While the optimal solution is $\{a,b,c\}$, the optimal value of $J(\lambda)$ is achieved by only $K_4$ and $\{a\}$. Therefore, Q-Lagrange outputs $\{a\}$, which is a feasible solution but has a density of $0$.
  • Figure 2: Histogram of opinions on $(\textsf{Vax}, \textsf{Ukraine})$ for the (a) full dataset and the outputs of Q-Peeling for $\theta=0.5$ with queries (b) $\bm{q}=(1,1)$, (c) $\bm{q}=(1,-1)$, (d) $\bm{q}=(-1,1)$, and (e) $\bm{q}=(-1,-1)$.
  • Figure 3: Results for the DBLP dataset with adjusting the threshold $\theta$: union of the outputs for queries $q_a$ over all areas, with the thresholds (a) $\theta=-100$, (b) $\theta=-1$, (c) $\theta=0$, and (d) $\theta=1.5$. The coloring of nodes is based on the specific query that extracts them, as detailed in (e). Notably, a node is colored gray if it is found by more than one query.
  • Figure 4: Results for query vector $\bm{q}_{\textsf{P}, \neg \textsf{RoDRe}}$. (a) Trade-off between the the average agreement and the density of the output of Q-Peeling as we range $\theta$. (b) The fraction of music fans liking each genre as a function of $\theta$.
  • Figure 5: Histogram of opinions on (Vax, Ukraine) for the outputs of Q-Peeling for $\theta=1.1$ with query $\bm{q}=(-1,1)$.

Theorems & Definitions (5)

  • proposition 1
  • proposition 2
  • proposition 3
  • proposition 4
  • proposition 5