Table of Contents
Fetching ...

Victim as a Service: Designing a System for Engaging with Interactive Scammers

Daniel Spokoyny, Nikolai Vogler, Xin Gao, Tianyi Zheng, Yufei Weng, Jonghyun Park, Jiajun Jiao, Geoffrey M. Voelker, Stefan Savage, Taylor Berg-Kirkpatrick

TL;DR

This work tackles the challenge of studying long-horizon online scams, such as pig-butchering, by introducing Chatterbox, a high-interaction honeypot that automates sustained engagement with scammers using LLM-based personas. The system combines victim-verisimilitude, cross-platform capabilities, multimedia handling, and robust human-in-the-loop oversight to collect rich attacker transcripts across weeks and multiple platforms. It contributes a detailed architecture, persona synthesis pipeline, information-seeking behaviors, seeding infrastructure, and a scalable workflow that enables large-scale data collection for defense, attribution, and policy research. Deployment results from a 7-week study show substantial engagement (thousands of scammer interactions, hundreds of multi-day conversations) and reveal insights into how scammers build trust, migrate across platforms, and proliferate monetization tactics, with practical implications for automated defenses and investigative interventions.

Abstract

Pig butchering, and similar interactive online scams, lower their victims' defenses by building trust over extended periods of conversation - sometimes weeks or months. They have become increasingly public losses (at least $75B by one recent study). However, because of their long-term conversational nature, they are extremely challenging to investigate at scale. In this paper, we describe the motivation, design, implementation, and experience with CHATTERBOX, an LLM-based system that automates long-term engagement with online scammers, making large-scale investigations of their tactics possible. We describe the techniques we have developed to attract scam attempts, the system and LLM-engineering required to convincingly engage with scammers, and the necessary capabilities required to satisfy or evade "milestones" in scammers' workflow.

Victim as a Service: Designing a System for Engaging with Interactive Scammers

TL;DR

This work tackles the challenge of studying long-horizon online scams, such as pig-butchering, by introducing Chatterbox, a high-interaction honeypot that automates sustained engagement with scammers using LLM-based personas. The system combines victim-verisimilitude, cross-platform capabilities, multimedia handling, and robust human-in-the-loop oversight to collect rich attacker transcripts across weeks and multiple platforms. It contributes a detailed architecture, persona synthesis pipeline, information-seeking behaviors, seeding infrastructure, and a scalable workflow that enables large-scale data collection for defense, attribution, and policy research. Deployment results from a 7-week study show substantial engagement (thousands of scammer interactions, hundreds of multi-day conversations) and reveal insights into how scammers build trust, migrate across platforms, and proliferate monetization tactics, with practical implications for automated defenses and investigative interventions.

Abstract

Pig butchering, and similar interactive online scams, lower their victims' defenses by building trust over extended periods of conversation - sometimes weeks or months. They have become increasingly public losses (at least $75B by one recent study). However, because of their long-term conversational nature, they are extremely challenging to investigate at scale. In this paper, we describe the motivation, design, implementation, and experience with CHATTERBOX, an LLM-based system that automates long-term engagement with online scammers, making large-scale investigations of their tactics possible. We describe the techniques we have developed to attract scam attempts, the system and LLM-engineering required to convincingly engage with scammers, and the necessary capabilities required to satisfy or evade "milestones" in scammers' workflow.

Paper Structure

This paper contains 57 sections, 8 figures, 12 tables.

Figures (8)

  • Figure 1: In this example of a scam attempt against Chatterbox, we illustrate three key phases: (left) the cross-platform request, which occurs 5 days after initial contact, (center) casual introduction to the bait, which is an investment opportunity, and (right) the cash-out phase where the scammer attempts to extract financial value using a fraudulent app/website. Each phase presents unique challenges and requirements for our system to effectively mimic human behavior and maintain believability. See Section \ref{['sec:motivation']} for a detailed walkthrough of this example and the system requirements it motivates.
  • Figure 2: System Architecture of Chatterbox. The system attracts initial inbound engagement from scammers by using a seeding module to generate realistic activity (e.g., reposts, likes) on social media accounts (Sec \ref{['sec:system_seeding']}). A polling module periodically retrieves incoming messages, which are placed into a message queue. From the queue, messages are sent to the HLLM (Sec \ref{['sec:honeypot_llm']}) to generate conversational responses and to a human-in-the-loop annotation interface for oversight and control (Sec \ref{['sec:annotate']}). The architecture supports the entire engagement lifecycle, including cross-platform migration to WhatsApp accounts to continue the conversation (Sec \ref{['sec:system_cross_platform']})
  • Figure 3: Example synthetic selfies for three different personas. The first image in each row is a 'seed' selfie, created by inpainting a synthetic face onto a real, masked photo to preserve a natural background and lighting (Section \ref{['sec:selfie_generation']}). The subsequent images are generated from the seed using an identity-preserving model to create additional, consistent poses for the same synthetic individual.
  • Figure 4: Annotation interfaces: left, inbound message triage; right, ongoing conversation monitoring.
  • Figure 5: Left: CDF of conversation lengths by number of messages. Middle: CDF of conversation durations (days). Right: Different entity types occur at various points throughout the conversations. For instance, platform and phone numbers tend to occur before financial information (suggesting cross-platform requests), while domains and multimedia occur later to build more trust. Financial instruments are not revealed until deeper into the conversations.
  • ...and 3 more figures