Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
W. K. M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid
TL;DR
The paper tackles data poisoning in wearable IoT HAR systems by leveraging large language models to detect and sanitize poisoned sensor data in zero-/one-/few-shot settings. It introduces a security-oriented, prompt-driven framework that uses role-play and chain-of-thought reasoning to infer poisoning indicators and generate cleaned data for downstream HAR training, reducing reliance on labeled datasets. Through theoretical metrics and extensive experiments on MotionSense, HHAR, and WISDM, the work demonstrates competitive poisoning detection and sanitization performance, while analyzing trade-offs in communication cost, latency, and privacy. The approach offers interpretability via natural-language justifications and shows promise for real-time, adaptable defenses in dynamic IoT environments, with future work focusing on edge deployment and privacy-preserving techniques.
Abstract
The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.
