Enhancing Robotic Arm Activity Recognition with Vision Transformers and Wavelet-Transformed Channel State Information
Rojin Zandi, Kian Behzad, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami
TL;DR
The paper tackles privacy and line-of-sight limitations in vision-based robotic arm activity recognition by leveraging passive Wi-Fi sensing of channel state information (CSI). It introduces a three-component pipeline that denoises CSI with discrete wavelet transform (DWT), extracts features via Vision Transformers (ViT) on patch-based CSI amplitudes, and classifies with an MLP using a cross-entropy objective. The ViT-DWT model achieves high accuracy (approximately 96.7%–97.4%) across four activities and shows strong leave-one-scenario-out generalization, outperforming CNN, CNN-LSTM, and standard transformer baselines. This approach enables privacy-preserving, sensor-free recognition of robotic arm activities in indoor environments, with potential applications in smart homes and robotics where cameras are undesirable or impractical.
Abstract
Vision-based methods are commonly used in robotic arm activity recognition. These approaches typically rely on line-of-sight (LoS) and raise privacy concerns, particularly in smart home applications. Passive Wi-Fi sensing represents a new paradigm for recognizing human and robotic arm activities, utilizing channel state information (CSI) measurements to identify activities in indoor environments. In this paper, a novel machine learning approach based on discrete wavelet transform and vision transformers for robotic arm activity recognition from CSI measurements in indoor settings is proposed. This method outperforms convolutional neural network (CNN) and long short-term memory (LSTM) models in robotic arm activity recognition, particularly when LoS is obstructed by barriers, without relying on external or internal sensors or visual aids. Experiments are conducted using four different data collection scenarios and four different robotic arm activities. Performance results demonstrate that wavelet transform can significantly enhance the accuracy of visual transformer networks in robotic arms activity recognition.
