Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

Sabyasachi Basu; Daniel Paul-Pena; Kun Qian; C. Seshadhri; Edward W Huang; Karthik Subbian

Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

Sabyasachi Basu, Daniel Paul-Pena, Kun Qian, C. Seshadhri, Edward W Huang, Karthik Subbian

TL;DR

This paper introduces regularly triangle-rich (RTR) sets to formalize the goal of covering large graphs with many dense subgraphs. It presents RTRExtractor, a provably approximate algorithm that outputs a disjoint family of $\Omega(1)$-RTR sets and guarantees that a constant fraction of any $\alpha$-RTR set is captured, with running time tied to triangle enumeration. The authors provide a practical, fast implementation and demonstrate high coverage on real networks, often achieving dense clusters that cover a substantial portion of the graph (e.g., a quarter of vertices in dense subgraphs on large social graphs). Empirically, RTRExtractor yields many large, dense subgraphs whose vertices group into meaningful, label-free communities, highlighting its utility for unsupervised graph discovery. The work also compares RTRExtractor against established dense-subgraph and community-detection methods, showing superior high-density coverage and scalability, and suggests directions for future work including hierarchical organization of outputs and further sparsification techniques.

Abstract

Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contains numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Most common formulations of the problem involve finding a single (or a few) "optimally" dense subsets. But in most real applications, one does not care for the optimality. Instead, we want to find a large collection of dense subsets that covers a significant fraction of the input graph. We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families. These families capture the notion of dense subgraphs that contain many triangles and have degrees comparable to the subgraph size. We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set. The algorithm is efficient and is inspired by recent results that use triangle counts for community testing and clustering. We show that RTRExtractor has excellent behavior on a large variety of real-world datasets. It is able to process graphs with hundreds of millions of edges within minutes. Across many datasets, RTRExtractor achieves high coverage using high edge density datasets. For example, the output covers a quarter of the vertices with subgraphs of edge density more than (say) $0.5$, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.

Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

TL;DR

-RTR sets and guarantees that a constant fraction of any

-RTR set is captured, with running time tied to triangle enumeration. The authors provide a practical, fast implementation and demonstrate high coverage on real networks, often achieving dense clusters that cover a substantial portion of the graph (e.g., a quarter of vertices in dense subgraphs on large social graphs). Empirically, RTRExtractor yields many large, dense subgraphs whose vertices group into meaningful, label-free communities, highlighting its utility for unsupervised graph discovery. The work also compares RTRExtractor against established dense-subgraph and community-detection methods, showing superior high-density coverage and scalability, and suggests directions for future work including hierarchical organization of outputs and further sparsification techniques.

Abstract

, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.

Paper Structure (31 sections, 6 theorems, 3 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 6 theorems, 3 figures, 8 tables, 1 algorithm.

Introduction
Our Contributions
Formulation through triangle-rich sets.
Theoretical algorithm and analysis.
Fast implementation of RTRExtractor.
High coverage in practice.
Finding many large dense subgraphs.
Qualitative examination of subsets.
Related Work
The main problem
Connection to coverage.
The Main Ideas Behind RTRExtractor
Challenges in the analysis
Analysis
The well separated case
...and 16 more sections

Key Result

theorem 1

Consider an input graph $G = (V,E)$. For any constant $\alpha$, there exists input parameters for the algorithm RTRExtractor with the following guarantees. (i) RTRExtractor outputs a disjoint family of sets $\mathcal{T}$, such that each set is $\Omega(1)$-regularly triangle-rich. (ii) For any$S$ tha

Figures (3)

Figure 1: Coverage of RTRExtractor compared to the two best performing competitors in a variety of networks of different sizes: from thousands to millions of vertices. For each method, we compute the fraction of vertices in sets of 5 vertices or more, and of density more than 0.5 and 0.8. RTRExtractor consistently covers significantly higher percentages in dense clusters.
Figure 2: The largest 20 RTRExtractor subsets produced on some datasets. and their respective densities. We note that even for the largest subsets, which in some cases may have hundreds of vertices, density is still remarkably high.
Figure 3: On the left we have the subgraphs from \ref{['tab:dblp-d']}, and on the right from \ref{['tab:dblp-s']}. In both cases, we first draw a Fruchterman Reingold forced drawing of the subgraph, and then a radial illustration of vertices by closeness centrality. The lowest degree vertex is denoted by a triangle. In the first case, the lowest degree vertex induces a one hop neighborhood that is much larger than the cluster, and has many low degree vertices; the vast majority (20 of 35) of these are cleaned away as singletons, marked in pale yellow. The red vertices form our cluster. The blue and green vertices belong to some other non trivial clusters. In terms of closeness centrality, vertices in the cluster are much closer to the 'central' (lowest degree) vertex. To the contrary, in the second cluster, the cluster is almost all of the one hop neighborhood, which has very few stray vertices. Only two of the vertices are peeled away as singletons.

Theorems & Definitions (9)

definition 1
definition 2
theorem 1
definition 3
theorem 2
lemma 1
lemma 2
lemma 3
theorem 3

Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

TL;DR

Abstract

Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)