AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning
Ismail Hossain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder
TL;DR
The paper tackles real-time scam prevention by unifying proactive AI-driven scambaiting with privacy-preserving detection. It introduces an instruction-tuned LLM pipeline that generates victim-like responses under a harm-aware utility function, augmented by a three-threshold risk control mechanism and a federated learning framework for on-device adaptation with optional differential privacy. The approach demonstrates strong scam detection and scam-baiting performance across diverse datasets, with robust safety, PII risk controls, and real-time interaction capabilities. Federated updates preserve user privacy while enabling continual improvement, making the system scalable and adaptable to evolving scam tactics. Collectively, the work advances proactive, privacy-conscious defenses against dynamic social-engineering threats and lays groundwork for real-world deployments with careful ethical and safety considerations.
Abstract
Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement $\approx$0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement ($\approx$0.80), strong relevance ($\approx$0.74), and low PII leakage ($\leq$0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.
