Table of Contents
Fetching ...

WECAR: An End-Edge Collaborative Inference and Training Framework for WiFi-Based Continuous Human Activity Recognition

Rong Li, Tao Deng, Siwei Feng, He Huang, Juncheng Jia, Di Yuan, Keqin Li

TL;DR

WECAR addresses the need for adaptive, privacy-preserving WiFi-based HAR on resource-constrained devices by decoupling training (edge) from inference (end) and introducing parameter-efficient continual learning. It employs a transformer-based FSM with task-specific dynamic prefix expansion and stability-aware selective retraining, complemented by a two-stage distillation framework (MHSA and prefix relation distillation) to compress models for end-device deployment. The approach is validated on three public CSI-HAR datasets, showing superior average accuracy and reduced parameter counts with modest forgetting compared to strong baselines. The results demonstrate practical viability for continuous sensing in privacy-sensitive, edge-constrained environments, offering a scalable design pattern for end-edge intelligent systems in ubiquitous sensing applications.

Abstract

WiFi-based human activity recognition (HAR) holds significant promise for ubiquitous sensing in smart environments. A critical challenge lies in enabling systems to dynamically adapt to evolving scenarios, learning new activities without catastrophic forgetting of prior knowledge, while adhering to the stringent computational constraints of edge devices. Current approaches struggle to reconcile these requirements due to prohibitive storage demands for retaining historical data and inefficient parameter utilization. We propose WECAR, an end-edge collaborative inference and training framework for WiFi-based continuous HAR, which decouples computational workloads to overcome these limitations. In this framework, edge devices handle model training, lightweight optimization, and updates, while end devices perform efficient inference. WECAR introduces two key innovations, i.e., dynamic continual learning with parameter efficiency and hierarchical distillation for end deployment. For the former, we propose a transformer-based architecture enhanced by task-specific dynamic model expansion and stability-aware selective retraining. For the latter, we propose a dual-phase distillation mechanism that includes multi-head self-attention relation distillation and prefix relation distillation. We implement WECAR based on heterogeneous hardware using Jetson Nano as edge devices and the ESP32 as end devices, respectively. Our experiments across three public WiFi datasets reveal that WECAR not only outperforms several state-of-the-art methods in performance and parameter efficiency, but also achieves a substantial reduction in the model's parameter count post-optimization without sacrificing accuracy. This validates its practicality for resource-constrained environments.

WECAR: An End-Edge Collaborative Inference and Training Framework for WiFi-Based Continuous Human Activity Recognition

TL;DR

WECAR addresses the need for adaptive, privacy-preserving WiFi-based HAR on resource-constrained devices by decoupling training (edge) from inference (end) and introducing parameter-efficient continual learning. It employs a transformer-based FSM with task-specific dynamic prefix expansion and stability-aware selective retraining, complemented by a two-stage distillation framework (MHSA and prefix relation distillation) to compress models for end-device deployment. The approach is validated on three public CSI-HAR datasets, showing superior average accuracy and reduced parameter counts with modest forgetting compared to strong baselines. The results demonstrate practical viability for continuous sensing in privacy-sensitive, edge-constrained environments, offering a scalable design pattern for end-edge intelligent systems in ubiquitous sensing applications.

Abstract

WiFi-based human activity recognition (HAR) holds significant promise for ubiquitous sensing in smart environments. A critical challenge lies in enabling systems to dynamically adapt to evolving scenarios, learning new activities without catastrophic forgetting of prior knowledge, while adhering to the stringent computational constraints of edge devices. Current approaches struggle to reconcile these requirements due to prohibitive storage demands for retaining historical data and inefficient parameter utilization. We propose WECAR, an end-edge collaborative inference and training framework for WiFi-based continuous HAR, which decouples computational workloads to overcome these limitations. In this framework, edge devices handle model training, lightweight optimization, and updates, while end devices perform efficient inference. WECAR introduces two key innovations, i.e., dynamic continual learning with parameter efficiency and hierarchical distillation for end deployment. For the former, we propose a transformer-based architecture enhanced by task-specific dynamic model expansion and stability-aware selective retraining. For the latter, we propose a dual-phase distillation mechanism that includes multi-head self-attention relation distillation and prefix relation distillation. We implement WECAR based on heterogeneous hardware using Jetson Nano as edge devices and the ESP32 as end devices, respectively. Our experiments across three public WiFi datasets reveal that WECAR not only outperforms several state-of-the-art methods in performance and parameter efficiency, but also achieves a substantial reduction in the model's parameter count post-optimization without sacrificing accuracy. This validates its practicality for resource-constrained environments.

Paper Structure

This paper contains 31 sections, 25 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The initial model was trained to recognize two activities: sitting and standing. However, after retraining the model with data from two new activities swimming and jumping, it loses the ability to recognize sitting and standing.
  • Figure 2: The middle layer includes WiFi receivers, which are responsible for data collection and inference. The upper layer represents the server, which is in charge of model training. After the receiver collects the CSI data, it transmits it to the server for model training. Once the training is complete, the trained model is sent back to the receiver for inference.
  • Figure 3: Framework overview.
  • Figure 4: Architecture of FSM. The right part contains the framework. The left part details how the model dynamically expands and selectively retrains during continual learning from training session $1$ to training session $t$. As new tasks are introduced, the model dynamically expands with new prefixes in the MHSA layer. In the MLP, a selective retraining strategy is implemented to adjust neuron weights, preserving learned outcomes from stable neurons while updating unstable neurons to accommodate new tasks.
  • Figure 5: Overview of dual-phase distillation. The left side illustrates the teacher model, which includes complex MLP layers and MHSA layer with specific prefixes. The right side depicts the student model, which employs simplified MLP layer and MHSA layer without retaining historical prefixes. The middle section provides a detailed overview of the distillation process from the initial training stage to the t-th training stage. In the initial lightweight phase, knowledge transfer is achieved through MHSA relation distillation. In the incremental lightweight phase, prefix relation distillation compresses the teacher model’s historical prefix knowledge into the student model.
  • ...and 4 more figures