Table of Contents
Fetching ...

AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning

Ismail Hossain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder

TL;DR

The paper tackles real-time scam prevention by unifying proactive AI-driven scambaiting with privacy-preserving detection. It introduces an instruction-tuned LLM pipeline that generates victim-like responses under a harm-aware utility function, augmented by a three-threshold risk control mechanism and a federated learning framework for on-device adaptation with optional differential privacy. The approach demonstrates strong scam detection and scam-baiting performance across diverse datasets, with robust safety, PII risk controls, and real-time interaction capabilities. Federated updates preserve user privacy while enabling continual improvement, making the system scalable and adaptable to evolving scam tactics. Collectively, the work advances proactive, privacy-conscious defenses against dynamic social-engineering threats and lays groundwork for real-world deployments with careful ethical and safety considerations.

Abstract

Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement $\approx$0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement ($\approx$0.80), strong relevance ($\approx$0.74), and low PII leakage ($\leq$0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.

AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning

TL;DR

The paper tackles real-time scam prevention by unifying proactive AI-driven scambaiting with privacy-preserving detection. It introduces an instruction-tuned LLM pipeline that generates victim-like responses under a harm-aware utility function, augmented by a three-threshold risk control mechanism and a federated learning framework for on-device adaptation with optional differential privacy. The approach demonstrates strong scam detection and scam-baiting performance across diverse datasets, with robust safety, PII risk controls, and real-time interaction capabilities. Federated updates preserve user privacy while enabling continual improvement, making the system scalable and adaptable to evolving scam tactics. Collectively, the work advances proactive, privacy-conscious defenses against dynamic social-engineering threats and lays groundwork for real-world deployments with careful ethical and safety considerations.

Abstract

Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement 0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement (0.80), strong relevance (0.74), and low PII leakage (0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.

Paper Structure

This paper contains 57 sections, 2 theorems, 25 equations, 14 figures, 16 tables.

Key Result

Theorem 1

The probability that the scammer continues with scam-like behavior is modeled as:

Figures (14)

  • Figure 1: Threat model showing scammer social engineering on social media and AI intervention via scam detection and scam-baiting.
  • Figure 2: Overview of the proposed real-time scam prevention system architecture. The pipeline includes four primary stages: (1) message monitoring and role identification, (2) scam detection using local LLMs, (3) AI-based scambaiting upon threshold breach, and (4) federated learning-based model aggregation on a global server to enhance detection while preserving privacy.
  • Figure 3: Federated Learning architecture for decentralized, privacy-preserving scam model training.
  • Figure 4: Visualization of the relationship between PII types and their associated risk scores. The plot highlights which canonical PII categories (e.g., email, address, social_security_number (ssn)) tend to be linked with higher average risk.
  • Figure 5: Mean perplexity comparison for our AI scam-baiter vs. a reference baiter over 100 conversations, showing consistently lower and more stable fluency in our model.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Definition 1
  • Theorem 1: Scam Likelihood Inversely Related to Response Utility
  • proof : Justification
  • Lemma 1: Engagement Without Utility Enables Scams