Table of Contents
Fetching ...

Application of AI-based Models for Online Fraud Detection and Analysis

Antonis Papasavva, Shane Johnson, Ed Lowther, Samantha Lundrigan, Enrico Mariconti, Anna Markovska, Nilufer Tuptuk

TL;DR

This systematic literature review maps the state-of-the-art AI and NLP techniques applied to text-based online fraud detection from 2019 to 2024, following PRISMA-ScR guidelines. It catalogues 223 eligible studies across 16 fraud types, with phishing (URLs, emails, SMS) and fake reviews being particularly prominent, and finds a lack of universal models capable of handling multiple fraud types. The review highlights heavy reliance on established data sources and traditional metrics, significant data quality and reproducibility gaps, and a shift toward dynamic, real-time data and hybrid transformer-based approaches. The work provides actionable guidance for researchers, policymakers, and practitioners and emphasizes the need for unsupervised/semi-supervised methods, real-time analytics, transparent reporting, and robust data hosting to improve robustness against evolving fraud tactics.

Abstract

Fraud is a prevalent offence that extends beyond financial loss, causing psychological and physical harm to victims. The advancements in online communication technologies alowed for online fraud to thrive in this vast network, with fraudsters increasingly using these channels for deception. With the progression of technologies like AI, there is a growing concern that fraud will scale up, using sophisticated methods, like deep-fakes in phishing campaigns, all generated by language generation models like ChatGPT. However, the application of AI in detecting and analyzing online fraud remains understudied. We conduct a Systematic Literature Review on AI and NLP techniques for online fraud detection. The review adhered the PRISMA-ScR protocol, with eligibility criteria including relevance to online fraud, use of text data, and AI methodologies. We screened 2,457 academic records, 350 met our eligibility criteria, and included 223. We report the state-of-the-art NLP techniques for analysing various online fraud categories; the training data sources; the NLP algorithms and models built; and the performance metrics employed for model evaluation. We find that current research on online fraud is divided into various scam activitiesand identify 16 different frauds that researchers focus on. This SLR enhances the academic understanding of AI-based detection methods for online fraud and offers insights for policymakers, law enforcement, and businesses on safeguarding against such activities. We conclude that focusing on specific scams lacks generalization, as multiple models are required for different fraud types. The evolving nature of scams limits the effectiveness of models trained on outdated data. We also identify issues in data limitations, training bias reporting, and selective presentation of metrics in model performance reporting, which can lead to potential biases in model evaluation.

Application of AI-based Models for Online Fraud Detection and Analysis

TL;DR

This systematic literature review maps the state-of-the-art AI and NLP techniques applied to text-based online fraud detection from 2019 to 2024, following PRISMA-ScR guidelines. It catalogues 223 eligible studies across 16 fraud types, with phishing (URLs, emails, SMS) and fake reviews being particularly prominent, and finds a lack of universal models capable of handling multiple fraud types. The review highlights heavy reliance on established data sources and traditional metrics, significant data quality and reproducibility gaps, and a shift toward dynamic, real-time data and hybrid transformer-based approaches. The work provides actionable guidance for researchers, policymakers, and practitioners and emphasizes the need for unsupervised/semi-supervised methods, real-time analytics, transparent reporting, and robust data hosting to improve robustness against evolving fraud tactics.

Abstract

Fraud is a prevalent offence that extends beyond financial loss, causing psychological and physical harm to victims. The advancements in online communication technologies alowed for online fraud to thrive in this vast network, with fraudsters increasingly using these channels for deception. With the progression of technologies like AI, there is a growing concern that fraud will scale up, using sophisticated methods, like deep-fakes in phishing campaigns, all generated by language generation models like ChatGPT. However, the application of AI in detecting and analyzing online fraud remains understudied. We conduct a Systematic Literature Review on AI and NLP techniques for online fraud detection. The review adhered the PRISMA-ScR protocol, with eligibility criteria including relevance to online fraud, use of text data, and AI methodologies. We screened 2,457 academic records, 350 met our eligibility criteria, and included 223. We report the state-of-the-art NLP techniques for analysing various online fraud categories; the training data sources; the NLP algorithms and models built; and the performance metrics employed for model evaluation. We find that current research on online fraud is divided into various scam activitiesand identify 16 different frauds that researchers focus on. This SLR enhances the academic understanding of AI-based detection methods for online fraud and offers insights for policymakers, law enforcement, and businesses on safeguarding against such activities. We conclude that focusing on specific scams lacks generalization, as multiple models are required for different fraud types. The evolving nature of scams limits the effectiveness of models trained on outdated data. We also identify issues in data limitations, training bias reporting, and selective presentation of metrics in model performance reporting, which can lead to potential biases in model evaluation.
Paper Structure (23 sections, 3 figures, 2 tables)

This paper contains 23 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Common pipeline for NLP-based models
  • Figure 3: PRISMA Chart
  • Figure 4: Percentage of scam types analyzed in the studies included for qualitative analysis