Dense Subgraph Discovery Meets Strong Triadic Closure
Chamalee Wickrama Arachchi, Iiro Kumpulainen, Nikolaj Tatti
TL;DR
The paper addresses dense subgraph discovery under the strong triadic closure (STC) constraint by labeling edges as strong or weak and maximizing q(U,L) = {m_s(U,L) + λ m_w(U,L)}/{|U|} with λ ∈ [0,1]. It proves NP-hardness for 0 ≤ λ < 1 and shows λ=1 yields a polynomial-time densest-subgraph formulation, while λ=0 corresponds to Max-Clique; to solve STC-den, it presents an exact ILP (STC-ILP), an LP-relaxation (STC-LP) with rounding, and four practical heuristics including STC-Cut, STC-Peel, and a continuous-relabelling peeling variant. Empirical results on synthetic and real networks demonstrate that the methods can recover ground-truth dense components, with STC-ILP delivering the best scores on small graphs and STC-Cut/STC-Peel offering scalable performance on larger networks; a DBLP case study confirms the approach yields interpretable, densely connected, STC-compliant subgraphs. Overall, the work provides a principled framework combining STC with density optimization, enabling robust community-like subgraph discovery and suggesting directions for extending to other density notions and weighted settings.
Abstract
Finding dense subgraphs is a core problem with numerous graph mining applications such as community detection in social networks and anomaly detection. However, in many real-world networks connections are not equal. One way to label edges as either strong or weak is to use strong triadic closure~(STC). Here, if one node connects strongly with two other nodes, then those two nodes should be connected at least with a weak edge. STC-labelings are not unique and finding the maximum number of strong edges is NP-hard. In this paper, we apply STC to dense subgraph discovery. More formally, our score for a given subgraph is the ratio between the sum of the number of strong edges and weak edges, weighted by a user parameter $λ$, and the number of nodes of the subgraph. Our goal is to find a subgraph and an STC-labeling maximizing the score. We show that for $λ= 1$, our problem is equivalent to finding the densest subgraph, while for $λ= 0$, our problem is equivalent to finding the largest clique, making our problem NP-hard. We propose an exact algorithm based on integer linear programming and four practical polynomial-time heuristics. We present an extensive experimental study that shows that our algorithms can find the ground truth in synthetic datasets and run efficiently in real-world datasets.
