Cover Edge-Based Novel Triangle Counting
David A. Bader, Fuhuan Li, Zhihui Du, Palina Pauliuchenka, Oliver Alvarado Rodriguez, Anant Gupta, Sai Sri Vastav Minnal, Valmik Nahata, Anya Ganeshan, Ahmet Gundogdu, Jason Lew
TL;DR
This work addresses efficient triangle counting in large, sparse graphs by introducing a BFS-generated cover-edge set that reduces unnecessary edge checks. The core method, CETC, and its sequential, shared-memory, and distributed-memory variants demonstrate competitive performance against state-of-the-art approaches by counting triangles using a compact edge subset and careful handling of duplicates. The authors provide an extensive open-source framework with 22 sequential and 11 parallel implementations, plus rigorous experiments on Graph500 RMAT and SNAP datasets, showing substantial speedups and dramatic communication reductions in distributed settings (e.g., CETC-DM achieves up to ~2368x lower communication on scale-42 graphs). The results highlight the method’s practicality across graph topologies and hardware, and the reproducible framework enables broader adoption and future extensions in high-performance triangle counting. Overall, CETC introduces a scalable, communication-aware paradigm that balances BFS preprocessing, edge intersections, and parallelism to advance triangle counting on modern architectures, with mathematical characterizations such as $O(m \, d_{max})$ and $O(m^{1.5})$–type behavior guiding its performance profile.$
Abstract
Listing and counting triangles in graphs is a key algorithmic kernel for network analyses, including community detection, clustering coefficients, k-trusses, and triangle centrality. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. Leveraging the breadth-first search (BFS) method, we can quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms that employ cover-edge sets are presented. The novel sequential algorithm performs competitively with the fastest previous approaches on both real and synthetic graphs, such as those from the Graph500 Benchmark and the MIT/Amazon/IEEE Graph Challenge. We implement 22 sequential algorithms for performance evaluation and comparison. At the same time, we employ OpenMP to parallelize 11 sequential algorithms, presenting an in-depth analysis of their parallel performance. Furthermore, we develop a distributed parallel algorithm that can asymptotically reduce communication on massive graphs. In our estimate from massive-scale Graph500 graphs, our distributed parallel algorithm can reduce the communication on a scale~36 graph by 1156x and on a scale~42 graph by 2368x. Comprehensive experiments are conducted on the recently launched Intel Xeon 8480+ processor and shed light on how graph attributes, such as topology, diameter, and degree distribution, can affect the performance of these algorithms.
