Table of Contents
Fetching ...

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

Sheng Tian, Xintan Zeng, Yifei Hu, Baokun Wang, Yongchao Liu, Yue Jin, Changhua Meng, Chuntao Hong, Tianyi Zhang, Weiqiang Wang

TL;DR

Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

Abstract

Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

TL;DR

Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

Abstract

Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

Paper Structure

This paper contains 15 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Example risk patterns. Risk pattern A describes the behavior of fraudsters who defraud funds from multiple victim users and quickly transfer them to different downstream bank cards. Risk pattern B describes that the fraudster collects the victim's funds multiple times in the name of investment through a shop, giving rewards in the early stage but no longer paying in the later stage. Precision and recall metrics are the evaluation criteria for measuring risk patterns in the industry.
  • Figure 2: An overview of the GraphRPM framework, which consists of subgraph enumeration, two-stage mining, and risk pattern assessment. The Edge-Involved Graph Isomorphism Network (EGIN) based pattern representation mapping is used to identify graph patterns with attributes.
  • Figure 3: An example of graph partitioning, where the nodes are color-coded to indicate their roles. The nodes colored in red serve as the starting nodes, while the nodes colored in blue represent the neighboring nodes of the starting nodes within their respective ego-graphs. To elaborate, $v_1$ and $v_4$ act as the starting nodes. $v_1$'s ego-graph encompasses its neighbor nodes $v_2$ and $v_3$, while $v_4$'s ego-graph includes nodes $v_5$, $v_6$, and $v_7$ as its neighbors. All the aforementioned nodes are designated as master nodes. In addition, nodes colored in gray signify the mirror nodes, which are replicas of the master nodes and carry the same IDs. For instance, the gray node marked $v_3$ within worker $1$ functions as a mirror for the master node $v_3$ located in worker $2$.
  • Figure 4: An example of expansion. (1) Left: an intermediate subgraph of edge set $\{e_{1,3}\}$ generated during the first iteration of expansion from the node $v_1$ is transmitted to $v_3$. (2) Middle: in the second iteration of expansion, $v_3$ will be activated and adds its edges $e_{3,1}$ and $e_{3,2}$ to this subgraph, respectively, thus producing two new intermediate subgraphs of edge sets $\{e_{1,3}, e_{3,1}\}$ and $\{e_{1, 3}, e_{3,2}\}$. (3) Right: in this case, the two subgraphs will be further transmitted to $v_3$'s neighbor nodes $v_1$ and $v_2$.
  • Figure 5: An example of coordination-free technique. (1) Left: two different workers reach the same subgraph. One starts from edges $\{e_{2,1}, e_{3,2}\}$ by adding $e_{1,3}$, the other starts from edge $\{e_{2,1}, e_{3,1}\}$ by adding $e_{3,2}$. (2) Right: we assign an ID attribute to each edge in advance, allowing each subgraph to obtain a simplistic representation based on the ascending order of edge IDs. Identical subgraphs will have the same representation. Each worker calculates a hash value and performs modulus operations on the representation of the subgraph, using the total number of edges within the subgraph as the divisor, in order to identify a specific edge within the subgraph. This particular edge is used to decide which worker will process the subgraph with all other works giving up the task, leading to coordination-free distributed redundant subgraph removal.
  • ...and 4 more figures