Table of Contents
Fetching ...

Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation

Sheng Xiang, Mingzhi Zhu, Dawei Cheng, Enxia Li, Ruihui Zhao, Yi Ouyang, Ling Chen, Yefeng Zheng

TL;DR

This paper tackles credit card fraud detection under limited labeling by modeling transactions as a temporal graph and learning with a semi-supervised, attribute-driven graph neural network called GTAN. GTAN combines attribute embeddings, a gated temporal attention mechanism, risk embedding, and a masking strategy to leverage both labeled and unlabeled data while mitigating label leakage. Empirical results on three datasets show GTAN outperforms strong baselines in AUC and AP, with robustness to varying proportions of labeled data and clear gains from attention and risk propagation components. The approach offers scalable, end-to-end fraud detection that can exploit rich categorical features and unlabeled data for practical deployment.

Abstract

Credit card fraud incurs a considerable cost for both cardholders and issuing banks. Contemporary methods apply machine learning-based classifiers to detect fraudulent behavior from labeled transaction records. But labeled data are usually a small proportion of billions of real transactions due to expensive labeling costs, which implies that they do not well exploit many natural features from unlabeled data. Therefore, we propose a semi-supervised graph neural network for fraud detection. Specifically, we leverage transaction records to construct a temporal transaction graph, which is composed of temporal transactions (nodes) and interactions (edges) among them. Then we pass messages among the nodes through a Gated Temporal Attention Network (GTAN) to learn the transaction representation. We further model the fraud patterns through risk propagation among transactions. The extensive experiments are conducted on a real-world transaction dataset and two publicly available fraud detection datasets. The result shows that our proposed method, namely GTAN, outperforms other state-of-the-art baselines on three fraud detection datasets. Semi-supervised experiments demonstrate the excellent fraud detection performance of our model with only a tiny proportion of labeled data.

Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation

TL;DR

This paper tackles credit card fraud detection under limited labeling by modeling transactions as a temporal graph and learning with a semi-supervised, attribute-driven graph neural network called GTAN. GTAN combines attribute embeddings, a gated temporal attention mechanism, risk embedding, and a masking strategy to leverage both labeled and unlabeled data while mitigating label leakage. Empirical results on three datasets show GTAN outperforms strong baselines in AUC and AP, with robustness to varying proportions of labeled data and clear gains from attention and risk propagation components. The approach offers scalable, end-to-end fraud detection that can exploit rich categorical features and unlabeled data for practical deployment.

Abstract

Credit card fraud incurs a considerable cost for both cardholders and issuing banks. Contemporary methods apply machine learning-based classifiers to detect fraudulent behavior from labeled transaction records. But labeled data are usually a small proportion of billions of real transactions due to expensive labeling costs, which implies that they do not well exploit many natural features from unlabeled data. Therefore, we propose a semi-supervised graph neural network for fraud detection. Specifically, we leverage transaction records to construct a temporal transaction graph, which is composed of temporal transactions (nodes) and interactions (edges) among them. Then we pass messages among the nodes through a Gated Temporal Attention Network (GTAN) to learn the transaction representation. We further model the fraud patterns through risk propagation among transactions. The extensive experiments are conducted on a real-world transaction dataset and two publicly available fraud detection datasets. The result shows that our proposed method, namely GTAN, outperforms other state-of-the-art baselines on three fraud detection datasets. Semi-supervised experiments demonstrate the excellent fraud detection performance of our model with only a tiny proportion of labeled data.

Paper Structure

This paper contains 22 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The illustration of typical credit card fraud detection process. The detection system of card issuer assesses each transaction with an online predictive model once it has passed account checking.
  • Figure 2: The illustration of the proposed model architecture and temporal graph attention mechanism.
  • Figure 3: The result of semi-supervised experiments with different ratios of labeled training data.
  • Figure 4: The ablation study results on three datasets. Gray bars represent the GTAN-A variant, blue bars represent the GTAN-R variant, and red bars represent the GTAN model.
  • Figure 5: Parameter sensitivity analysis with respect to (a) the number of GNN layers; (b) the number of temporal edges per node; (c) hidden dimension; and (d) the batch size.