Table of Contents
Fetching ...

Dupin: A Parallel Framework for Densest Subgraph Discovery in Fraud Detection on Massive Graphs (Technical Report)

Jiaxin Jiang, Siyuan Yao, Yuchen Li, Qiange Wang, Bingsheng He, Min Chen

TL;DR

Fraud detection on billion-scale graphs requires scalable Densest Subgraph Discovery. Dupin provides a generic parallel peeling framework with global and local pruning to accelerate DSD across multiple density metrics (DG, DW, FD, TDS, kCLiDS) while preserving approximation guarantees. Theoretical results show a k(1+ε)-approximation with a logarithmic bound on peeling rounds, and empirical results demonstrate up to 100x faster detection and fraud-prevention gains up to 94.5% on real-world, large-scale graphs. Dupin’s architecture, APIs, and long-tail pruning enable flexible, real-time fraud analytics, making it a practical tool for production fraud-detection pipelines.

Abstract

Detecting fraudulent activities in financial and e-commerce transaction networks is crucial. One effective method for this is Densest Subgraph Discovery (DSD). However, deploying DSD methods in production systems faces substantial scalability challenges due to the predominantly sequential nature of existing methods, which impedes their ability to handle large-scale transaction networks and results in significant detection delays. To address these challenges, we introduce Dupin, a novel parallel processing framework designed for efficient DSD processing in billion-scale graphs. Dupin is powered by a processing engine that exploits the unique properties of the peeling process, with theoretical guarantees on detection quality and efficiency. Dupin provides userfriendly APIs for flexible customization of DSD objectives and ensures robust adaptability to diverse fraud detection scenarios. Empirical evaluations demonstrate that Dupin consistently outperforms several existing DSD methods, achieving performance improvements of up to 100 times compared to traditional approaches. On billion-scale graphs, Dupin demonstrates the potential to enhance the prevention of fraudulent transactions from 45% to 94.5% and reduces density error from 30% to below 5%, as supported by our experimental results. These findings highlight the effectiveness of Dupin in real-world applications, ensuring both speed and accuracy in fraud detection.

Dupin: A Parallel Framework for Densest Subgraph Discovery in Fraud Detection on Massive Graphs (Technical Report)

TL;DR

Fraud detection on billion-scale graphs requires scalable Densest Subgraph Discovery. Dupin provides a generic parallel peeling framework with global and local pruning to accelerate DSD across multiple density metrics (DG, DW, FD, TDS, kCLiDS) while preserving approximation guarantees. Theoretical results show a k(1+ε)-approximation with a logarithmic bound on peeling rounds, and empirical results demonstrate up to 100x faster detection and fraud-prevention gains up to 94.5% on real-world, large-scale graphs. Dupin’s architecture, APIs, and long-tail pruning enable flexible, real-time fraud analytics, making it a practical tool for production fraud-detection pipelines.

Abstract

Detecting fraudulent activities in financial and e-commerce transaction networks is crucial. One effective method for this is Densest Subgraph Discovery (DSD). However, deploying DSD methods in production systems faces substantial scalability challenges due to the predominantly sequential nature of existing methods, which impedes their ability to handle large-scale transaction networks and results in significant detection delays. To address these challenges, we introduce Dupin, a novel parallel processing framework designed for efficient DSD processing in billion-scale graphs. Dupin is powered by a processing engine that exploits the unique properties of the peeling process, with theoretical guarantees on detection quality and efficiency. Dupin provides userfriendly APIs for flexible customization of DSD objectives and ensures robust adaptability to diverse fraud detection scenarios. Empirical evaluations demonstrate that Dupin consistently outperforms several existing DSD methods, achieving performance improvements of up to 100 times compared to traditional approaches. On billion-scale graphs, Dupin demonstrates the potential to enhance the prevention of fraudulent transactions from 45% to 94.5% and reduces density error from 30% to below 5%, as supported by our experimental results. These findings highlight the effectiveness of Dupin in real-world applications, ensuring both speed and accuracy in fraud detection.

Paper Structure

This paper contains 21 sections, 7 theorems, 3 equations, 19 figures, 10 tables, 4 algorithms.

Key Result

Theorem 2.1

For the vertex set $S^p$ returned by Algorithm algo:peeling and the optimal vertex set $S^*$, it holds that $g(S^p) \geq \frac{g(S^*)}{2}$ for $\mathsf{DG}$, $\mathsf{DW}$ and $\mathsf{FD}$ as the density metrics.

Figures (19)

  • Figure 1: Activity analysis from our industry partner, Grab. Data normalized for privacy.
  • Figure 2: Fraudulent coupon abuse vs. normal coupon usage in customer-merchant networks.
  • Figure 3: Example of sequential peeling algorithms.
  • Figure 4: Architecture of $\mathsf{Dupin}$.
  • Figure 5: Illustration of Parallel $\mathsf{DW}$. Nodes marked in red indicate those to be peeled in the current iteration. The grayed areas represent nodes that have been peeled.
  • ...and 14 more figures

Theorems & Definitions (12)

  • Theorem 2.1: hooi2016fraudar
  • Example 2.1: Sequential Peeling Algorithm Process
  • Theorem 2.2: tsourakakis2015k
  • Example 4.1: Parallel Peeling Algorithm
  • Example 4.2
  • Lemma 4.1
  • Theorem 4.2
  • Definition 5.1: Long-Tail vertex
  • Lemma 5.1
  • Example 5.1
  • ...and 2 more