Table of Contents
Fetching ...

DTC: Real-Time and Accurate Distributed Triangle Counting in Fully Dynamic Graph Streams

Wei Xuan, Yan Liang, Huawei Cao, Ning Lin, Xiaochun Ye, Dongrui Fan

TL;DR

DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams, and DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions, are proposed.

Abstract

Triangle counting is a fundamental problem in graph mining, essential for analyzing graph streams with arbitrary edge orders. However, exact counting becomes impractical due to the massive size of real-world graph streams. To address this, approximate algorithms have been developed, but existing distributed streaming algorithms lack adaptability and struggle with edge deletions. In this article, we propose DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams. Our DTC-AR algorithm accurately estimates triangle counts without prior knowledge of graph size, leveraging multi-machine resources. Additionally, we introduce DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions. Using Random Pairing and future edge insertion compensation, DTC-FD achieves unbiased and accurate approximations across multiple machines. Experimental results demonstrate significant improvements over baselines. DTC-AR achieves up to $2029.4\times$ and $27.1\times$ more accuracy, while maintaining the best trade-off between accuracy and storage space. DTC-FD reduces estimation errors by up to $32.5\times$ and $19.3\times$, scaling linearly with graph stream size. These findings highlight the effectiveness of our proposed algorithms in tackling triangle counting in real-world scenarios. The source code and datasets are released and available at \href{https://github.com/wayne4s/srds-dtc.git}{https://github.com/wayne4s/srds-dtc.git}.

DTC: Real-Time and Accurate Distributed Triangle Counting in Fully Dynamic Graph Streams

TL;DR

DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams, and DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions, are proposed.

Abstract

Triangle counting is a fundamental problem in graph mining, essential for analyzing graph streams with arbitrary edge orders. However, exact counting becomes impractical due to the massive size of real-world graph streams. To address this, approximate algorithms have been developed, but existing distributed streaming algorithms lack adaptability and struggle with edge deletions. In this article, we propose DTC, a novel family of single-pass distributed streaming algorithms for global and local triangle counting in fully dynamic graph streams. Our DTC-AR algorithm accurately estimates triangle counts without prior knowledge of graph size, leveraging multi-machine resources. Additionally, we introduce DTC-FD, an algorithm tailored for fully dynamic graph streams, incorporating edge insertions and deletions. Using Random Pairing and future edge insertion compensation, DTC-FD achieves unbiased and accurate approximations across multiple machines. Experimental results demonstrate significant improvements over baselines. DTC-AR achieves up to and more accuracy, while maintaining the best trade-off between accuracy and storage space. DTC-FD reduces estimation errors by up to and , scaling linearly with graph stream size. These findings highlight the effectiveness of our proposed algorithms in tackling triangle counting in real-world scenarios. The source code and datasets are released and available at \href{https://github.com/wayne4s/srds-dtc.git}{https://github.com/wayne4s/srds-dtc.git}.

Paper Structure

This paper contains 31 sections, 6 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: An illustrative architecture of distributed triangle counting in graph streams.
  • Figure 2: Accuracy and Speed of DTC-AR. DTC-AR demonstrates significantly enhanced accuracy and speed compared to other algorithms. It achieves up to 2029.4$\times$ and 27.1$\times$ higher accuracy than MASCOT in terms of global error and local error, respectively. Moreover, the Pearson coefficient of DTC-AR consistently approaches 1, indicating its superior performance compared to alternative algorithms.
  • Figure 3: Scalability of DTC-AR. The scalability of DTC-AR is directly proportional to the sampling threshold in graph streams, ensuring efficient performance as the threshold increases.
  • Figure 4: Accuracy of DTC-FD. DTC-FD demonstrates significantly higher accuracy compared to MASCOT-FD, achieving up to 32.5$\times$ and 19.3$\times$ improvement in global error and local error, respectively. Additionally, DTC-FD outperforms ThinkDAcc with up to 109.7$\times$ improvement in global variance and 2.6$\times$ improvement in the Pearson coefficient.
  • Figure 5: Speed and accuracy of DTC-FD. DTC-FD achieves the best trade-off between speed and accuracy compared to other algorithms for triangle counting in fully-dynamic graph streams.
  • ...and 2 more figures