Table of Contents
Fetching ...

Distinguishing Scams and Fraud with Ensemble Learning

Isha Chadalavada, Tianhui Huang, Jessica Staddon

TL;DR

This paper tackles distinguishing scam from non-scam fraud in CFPB complaint narratives using an ensemble prompting approach across Gemini and GPT-4. It builds a high-precision, high-recall ensemble, validated on a manually labeled set of 300 complaints and evaluated on a larger 2,569-narrative corpus. Key findings reveal that narrative length, redaction, and company-name mentions affect performance, and that LLMs may over-rely on secondary information or reputation. The work offers practical guidance for safer user interactions with LLMs in scam defense and points to open problems and needed data for broader evaluation.

Abstract

Users increasingly query LLM-enabled web chatbots for help with scam defense. The Consumer Financial Protection Bureau's complaints database is a rich data source for evaluating LLM performance on user scam queries, but currently the corpus does not distinguish between scam and non-scam fraud. We developed an LLM ensemble approach to distinguishing scam and fraud CFPB complaints and describe initial findings regarding the strengths and weaknesses of LLMs in the scam defense context.

Distinguishing Scams and Fraud with Ensemble Learning

TL;DR

This paper tackles distinguishing scam from non-scam fraud in CFPB complaint narratives using an ensemble prompting approach across Gemini and GPT-4. It builds a high-precision, high-recall ensemble, validated on a manually labeled set of 300 complaints and evaluated on a larger 2,569-narrative corpus. Key findings reveal that narrative length, redaction, and company-name mentions affect performance, and that LLMs may over-rely on secondary information or reputation. The work offers practical guidance for safer user interactions with LLMs in scam defense and points to open problems and needed data for broader evaluation.

Abstract

Users increasingly query LLM-enabled web chatbots for help with scam defense. The Consumer Financial Protection Bureau's complaints database is a rich data source for evaluating LLM performance on user scam queries, but currently the corpus does not distinguish between scam and non-scam fraud. We developed an LLM ensemble approach to distinguishing scam and fraud CFPB complaints and describe initial findings regarding the strengths and weaknesses of LLMs in the scam defense context.

Paper Structure

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: Precision and recall of the final model, $F$, as a function of complaint narrative length in characters.
  • Figure 2: Each figure shows the accuracy of the final model, $F$, as a function of the fraction of redaction for narratives grouped by length. The narratives represented in the left most figure are the 99 messages in $L$ of at most 875 characters ("short"); the middle figure represents the 100 narratives in $L$ of length 875 - 1,602 characters ("medium"); the right most figure represents the 101 narratives with at lest 1,602 characters ("long"). Note that performance is more robust to redaction with longer narratives.