Table of Contents
Fetching ...

Detecting Scams Using Large Language Models

Liming Jiang

TL;DR

This paper addresses the need for scalable scam detection in cybersecurity by exploring the use of large language models to identify phishing, advance-fee, romance, and related scams. It proposes an end-to-end workflow for building LLM-based scam detectors, detailing data collection, preprocessing, labeling, model selection and fine-tuning, evaluation, thresholding, and deployment. The preliminary evaluation with GPT-3.5 and GPT-4 demonstrates that state-of-the-art LLMs can detect common scam cues in a sample phishing email, indicating feasibility while acknowledging the necessity for broader validation across tasks, domains, and datasets. The work emphasizes ongoing refinement, domain-expert collaboration, and adaptation to evolving threats as essential to practical deployment.

Abstract

Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. Unlike traditional applications, we propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams. We present notable security applications of LLMs and discuss the unique challenges posed by scams. Specifically, we outline the key steps involved in building an effective scam detector using LLMs, emphasizing data collection, preprocessing, model selection, training, and integration into target systems. Additionally, we conduct a preliminary evaluation using GPT-3.5 and GPT-4 on a duplicated email, highlighting their proficiency in identifying common signs of phishing or scam emails. The results demonstrate the models' effectiveness in recognizing suspicious elements, but we emphasize the need for a comprehensive assessment across various language tasks. The paper concludes by underlining the importance of ongoing refinement and collaboration with cybersecurity experts to adapt to evolving threats.

Detecting Scams Using Large Language Models

TL;DR

This paper addresses the need for scalable scam detection in cybersecurity by exploring the use of large language models to identify phishing, advance-fee, romance, and related scams. It proposes an end-to-end workflow for building LLM-based scam detectors, detailing data collection, preprocessing, labeling, model selection and fine-tuning, evaluation, thresholding, and deployment. The preliminary evaluation with GPT-3.5 and GPT-4 demonstrates that state-of-the-art LLMs can detect common scam cues in a sample phishing email, indicating feasibility while acknowledging the necessity for broader validation across tasks, domains, and datasets. The work emphasizes ongoing refinement, domain-expert collaboration, and adaptation to evolving threats as essential to practical deployment.

Abstract

Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. Unlike traditional applications, we propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams. We present notable security applications of LLMs and discuss the unique challenges posed by scams. Specifically, we outline the key steps involved in building an effective scam detector using LLMs, emphasizing data collection, preprocessing, model selection, training, and integration into target systems. Additionally, we conduct a preliminary evaluation using GPT-3.5 and GPT-4 on a duplicated email, highlighting their proficiency in identifying common signs of phishing or scam emails. The results demonstrate the models' effectiveness in recognizing suspicious elements, but we emphasize the need for a comprehensive assessment across various language tasks. The paper concludes by underlining the importance of ongoing refinement and collaboration with cybersecurity experts to adapt to evolving threats.
Paper Structure (4 sections, 2 figures)

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: An example of a scam.
  • Figure 2: Workflow of the method