Table of Contents
Fetching ...

Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements

Shu Yang, Shenzhe Zhu, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F. Wong, Di Wang

TL;DR

Fraud-R1 presents a bilingual, multi-round benchmark to test LLM defenses against real-world fraud and phishing across five categories. It combines a base dataset with a rule-based augmentation pipeline to simulate progressive fraud tactics and evaluates models under Helpful Assistant and Role-play settings, using a GPT-4o-mini judge to compute Defense Success Rate and related metrics. The results reveal substantial challenges in detecting fraud, especially in role-play and fake job postings, and show language-induced performance gaps, underscoring the need for multilingual fraud detection enhancements. The work also discusses ethical considerations and safeguards against misuse, aiming to advance safer, more robust AI-powered decision-making.

Abstract

We introduce Fraud-R1, a benchmark designed to evaluate LLMs' ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs' resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: 1. Helpful-Assistant, where the LLM provides general decision-making assistance, and 2. Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.

Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements

TL;DR

Fraud-R1 presents a bilingual, multi-round benchmark to test LLM defenses against real-world fraud and phishing across five categories. It combines a base dataset with a rule-based augmentation pipeline to simulate progressive fraud tactics and evaluates models under Helpful Assistant and Role-play settings, using a GPT-4o-mini judge to compute Defense Success Rate and related metrics. The results reveal substantial challenges in detecting fraud, especially in role-play and fake job postings, and show language-induced performance gaps, underscoring the need for multilingual fraud detection enhancements. The work also discusses ethical considerations and safeguards against misuse, aiming to advance safer, more robust AI-powered decision-making.

Abstract

We introduce Fraud-R1, a benchmark designed to evaluate LLMs' ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs' resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: 1. Helpful-Assistant, where the LLM provides general decision-making assistance, and 2. Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.

Paper Structure

This paper contains 49 sections, 3 equations, 41 figures, 8 tables.

Figures (41)

  • Figure 1: An overview of Fraud-R1 evaluation flow. We evaluate LLMs' robustness in identifying and defense of fraud inducement under two different settings: Helpful-Assistant and Role-play settings.
  • Figure 2: Overview of our dataset. Fraud-R1 includes five challenging classes of fraud and phishing inducement: Fraudulent Services, Impersonation, Phishing Scams, Fake Job Postings, and Online Relationships. The dataset is designed to evaluate the ability of "victim" LLMs to detect and defend against these threats.
  • Figure 3: The step-by-step augmented fraud of 4 levels, including Base, Building Credibility, Creating Urgency, Exploiting Emotional Appeal.
  • Figure 4: Overview of existing fraud datasets and corresponding examples.
  • Figure 5: Data Construction and Augmentation Pipeline. Our process begins with real-world fraud cases sourced from multiple channels. We then extract key Fraudulent Strategies and Fraudulent Intentions from these cases. Next, we employ Deepseek-R1 to generate fraudulent messages, emails, and posts, which are subsequently filtered to form FP-base (Base Dataset). Finally, through a multi-stage refinement process, we construct FP-levelup (Level-up Dataset) to enable robust evaluation of LLMs against increasingly sophisticated fraudulent scenarios.
  • ...and 36 more figures