Table of Contents
Fetching ...

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

TL;DR

SEFraud tackles the need for integrated, self-explainable fraud detection on heterogeneous graphs by introducing a heterogeneous graph transformer with learnable node-feature and edge masks and a contrastive triplet loss. The approach jointly optimizes detection and mask-based interpretability, formalized through the loss $L = (1-\\lambda)L_{ce} +\\lambda L_{tr}$, and demonstrates superior fraud-detection performance and explanatory fidelity across multiple datasets. Its millisecond-scale explanations and successful deployment at ICBC showcase practical viability for large-scale financial systems. The work advances both the methodology of self-explainable graph models and the deployment of interpretable fraud detection in production environments.

Abstract

Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods such as a GNNExplainer. However, post-hoc explanations can not facilitate the model predictions and the computational cost of these methods cannot meet practical requirements, thus limiting their application in real-world scenarios. To address these issues, we propose SEFraud, a novel graph-based self-explainable fraud detection framework that simultaneously tackles fraud detection and result in interpretability. Concretely, SEFraud first leverages customized heterogeneous graph transformer networks with learnable feature masks and edge masks to learn expressive representations from the informative heterogeneously typed transactions. A new triplet loss is further designed to enhance the performance of mask learning. Empirical results on various datasets demonstrate the effectiveness of SEFraud as it shows considerable advantages in both the fraud detection performance and interpretability of prediction results. Moreover, SEFraud has been deployed and offers explainable fraud detection service for the largest bank in China, Industrial and Commercial Bank of China Limited (ICBC). Results collected from the production environment of ICBC show that SEFraud can provide accurate detection results and comprehensive explanations that align with the expert business understanding, confirming its efficiency and applicability in large-scale online services.

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

TL;DR

SEFraud tackles the need for integrated, self-explainable fraud detection on heterogeneous graphs by introducing a heterogeneous graph transformer with learnable node-feature and edge masks and a contrastive triplet loss. The approach jointly optimizes detection and mask-based interpretability, formalized through the loss , and demonstrates superior fraud-detection performance and explanatory fidelity across multiple datasets. Its millisecond-scale explanations and successful deployment at ICBC showcase practical viability for large-scale financial systems. The work advances both the methodology of self-explainable graph models and the deployment of interpretable fraud detection in production environments.

Abstract

Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods such as a GNNExplainer. However, post-hoc explanations can not facilitate the model predictions and the computational cost of these methods cannot meet practical requirements, thus limiting their application in real-world scenarios. To address these issues, we propose SEFraud, a novel graph-based self-explainable fraud detection framework that simultaneously tackles fraud detection and result in interpretability. Concretely, SEFraud first leverages customized heterogeneous graph transformer networks with learnable feature masks and edge masks to learn expressive representations from the informative heterogeneously typed transactions. A new triplet loss is further designed to enhance the performance of mask learning. Empirical results on various datasets demonstrate the effectiveness of SEFraud as it shows considerable advantages in both the fraud detection performance and interpretability of prediction results. Moreover, SEFraud has been deployed and offers explainable fraud detection service for the largest bank in China, Industrial and Commercial Bank of China Limited (ICBC). Results collected from the production environment of ICBC show that SEFraud can provide accurate detection results and comprehensive explanations that align with the expert business understanding, confirming its efficiency and applicability in large-scale online services.
Paper Structure (30 sections, 10 equations, 4 figures, 7 tables)

This paper contains 30 sections, 10 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: As a self-explainable model, SEFraud accepts the subgraph which contains both nodes features and edges as the input and provides prediction results and the consistent explanations from edge weight and node feature. For the subgraph of 't1' in the example, the prediction rendered by SEFraud is: 't1' is a fraudster. The explanation for this prediction is twofold: from edge perspective, it is attributed to the fund nexus with user 's1'; from the node feature perspective, the three most important features contributing to the prediction of ‘t1’ as a fraudster are the capital balance, subprime loans, and education background.
  • Figure 2: The architecture of SEFraud. A heterogeneous convolution layer is utilized to aggregate the hetero-graph information and generate the feature embedding for each node. These feature embeddings, raw features, and node type encodings for each node are then concatenated to form the input for FNet. An edge embedding consists of the node embeddings at its two ends, and concatenates with the edge type encodings to form the the input for ENet. The learned feature masks and edge masks are further leveraged to reconstruct a weighted hetero-graph, which serves as the input for the GNN/Detection model. A contrastive triplet loss is then constructed based on the output of the model for the training process.
  • Figure 3: Qualitative explanation examples comparison. Node labels are represented by their colors. Explanations of instance in each dataset are highlighted by bold black edges ranked by their importance weights.
  • Figure 4: Explanation examples from ICBC dataset.