Distinguishing Scams and Fraud with Ensemble Learning
Isha Chadalavada, Tianhui Huang, Jessica Staddon
TL;DR
This paper tackles distinguishing scam from non-scam fraud in CFPB complaint narratives using an ensemble prompting approach across Gemini and GPT-4. It builds a high-precision, high-recall ensemble, validated on a manually labeled set of 300 complaints and evaluated on a larger 2,569-narrative corpus. Key findings reveal that narrative length, redaction, and company-name mentions affect performance, and that LLMs may over-rely on secondary information or reputation. The work offers practical guidance for safer user interactions with LLMs in scam defense and points to open problems and needed data for broader evaluation.
Abstract
Users increasingly query LLM-enabled web chatbots for help with scam defense. The Consumer Financial Protection Bureau's complaints database is a rich data source for evaluating LLM performance on user scam queries, but currently the corpus does not distinguish between scam and non-scam fraud. We developed an LLM ensemble approach to distinguishing scam and fraud CFPB complaints and describe initial findings regarding the strengths and weaknesses of LLMs in the scam defense context.
