Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang, Shenzhe Zhu, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F. Wong, Di Wang
TL;DR
Fraud-R1 presents a bilingual, multi-round benchmark to test LLM defenses against real-world fraud and phishing across five categories. It combines a base dataset with a rule-based augmentation pipeline to simulate progressive fraud tactics and evaluates models under Helpful Assistant and Role-play settings, using a GPT-4o-mini judge to compute Defense Success Rate and related metrics. The results reveal substantial challenges in detecting fraud, especially in role-play and fake job postings, and show language-induced performance gaps, underscoring the need for multilingual fraud detection enhancements. The work also discusses ethical considerations and safeguards against misuse, aiming to advance safer, more robust AI-powered decision-making.
Abstract
We introduce Fraud-R1, a benchmark designed to evaluate LLMs' ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs' resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: 1. Helpful-Assistant, where the LLM provides general decision-making assistance, and 2. Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.
