Table of Contents
Fetching ...

On-Device Training Empowered Transfer Learning For Human Activity Recognition

Pixi Kang, Julian Moosmann, Sizhen Bian, Michele Magno

TL;DR

The paper tackles the challenge of user-induced concept drift in HAR by introducing on-device transfer learning (ODTL) that updates only the classifier on MCU-grade devices, thereby preserving privacy and reducing data transfer. It designs quantized, lightweight on-device training engines for STM32F7 and GAP9, and evaluates them on three sensor modalities (RecGym, QVAR, Ultra) to quantify UICD impact and personalization gains. Results show that ODTL yields accuracy improvements (RecGym +3.73%, QVAR +17.38%, Ultra +3.70%) and that GAP9 dramatically outperforms STM32F7 in both latency (≈20x) and energy (inference up to ≈120x, ODTL up to ≈280x), demonstrating the practicality of edge continual learning for HAR. The study underscores the potential of low-power parallel edge hardware to enable real-time, privacy-preserving personalized HAR on resource-constrained devices and outlines future directions for risk management and efficient use of user data.

Abstract

Human Activity Recognition (HAR) is an attractive topic to perceive human behavior and supplying assistive services. Besides the classical inertial unit and vision-based HAR methods, new sensing technologies, such as ultrasound and body-area electric fields, have emerged in HAR to enhance user experience and accommodate new application scenarios. As those sensors are often paired with AI for HAR, they frequently encounter challenges due to limited training data compared to the more widely IMU or vision-based HAR solutions. Additionally, user-induced concept drift (UICD) is common in such HAR scenarios. UICD is characterized by deviations in the sample distribution of new users from that of the training participants, leading to deteriorated recognition performance. This paper proposes an on-device transfer learning (ODTL) scheme tailored for energy- and resource-constrained IoT edge devices. Optimized on-device training engines are developed for two representative MCU-level edge computing platforms: STM32F756ZG and GAP9. Based on this, we evaluated the ODTL benefits in three HAR scenarios: body capacitance-based gym activity recognition, QVAR- and ultrasonic-based hand gesture recognition. We demonstrated an improvement of 3.73%, 17.38%, and 3.70% in the activity recognition accuracy, respectively. Besides this, we observed that the RISC-V-based GAP9 achieves 20x and 280x less latency and power consumption than STM32F7 MCU during the ODTL deployment, demonstrating the advantages of employing the latest low-power parallel computing devices for edge tasks.

On-Device Training Empowered Transfer Learning For Human Activity Recognition

TL;DR

The paper tackles the challenge of user-induced concept drift in HAR by introducing on-device transfer learning (ODTL) that updates only the classifier on MCU-grade devices, thereby preserving privacy and reducing data transfer. It designs quantized, lightweight on-device training engines for STM32F7 and GAP9, and evaluates them on three sensor modalities (RecGym, QVAR, Ultra) to quantify UICD impact and personalization gains. Results show that ODTL yields accuracy improvements (RecGym +3.73%, QVAR +17.38%, Ultra +3.70%) and that GAP9 dramatically outperforms STM32F7 in both latency (≈20x) and energy (inference up to ≈120x, ODTL up to ≈280x), demonstrating the practicality of edge continual learning for HAR. The study underscores the potential of low-power parallel edge hardware to enable real-time, privacy-preserving personalized HAR on resource-constrained devices and outlines future directions for risk management and efficient use of user data.

Abstract

Human Activity Recognition (HAR) is an attractive topic to perceive human behavior and supplying assistive services. Besides the classical inertial unit and vision-based HAR methods, new sensing technologies, such as ultrasound and body-area electric fields, have emerged in HAR to enhance user experience and accommodate new application scenarios. As those sensors are often paired with AI for HAR, they frequently encounter challenges due to limited training data compared to the more widely IMU or vision-based HAR solutions. Additionally, user-induced concept drift (UICD) is common in such HAR scenarios. UICD is characterized by deviations in the sample distribution of new users from that of the training participants, leading to deteriorated recognition performance. This paper proposes an on-device transfer learning (ODTL) scheme tailored for energy- and resource-constrained IoT edge devices. Optimized on-device training engines are developed for two representative MCU-level edge computing platforms: STM32F756ZG and GAP9. Based on this, we evaluated the ODTL benefits in three HAR scenarios: body capacitance-based gym activity recognition, QVAR- and ultrasonic-based hand gesture recognition. We demonstrated an improvement of 3.73%, 17.38%, and 3.70% in the activity recognition accuracy, respectively. Besides this, we observed that the RISC-V-based GAP9 achieves 20x and 280x less latency and power consumption than STM32F7 MCU during the ODTL deployment, demonstrating the advantages of employing the latest low-power parallel computing devices for edge tasks.
Paper Structure (20 sections, 10 equations, 6 figures, 5 tables)

This paper contains 20 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Network topology. (a) Whole network. (b) Residual block.
  • Figure 2: Architectures of implemented on-device training engines on STM32F7 series MCU and GAP9 processor. (a) STM32F7 MCU. (b) GAP9.
  • Figure 3: Novel sensor-based human activity data sets: devices and defined activities. (a) HBC+IMU-based gym activity recognition. (b) QVAR+IMU-based hand gesture recognition. (c) 40KHz ultrasonic-based hand gesture recognition.
  • Figure 4: 2-D t-SNE plot of the obtained embeddings in one round of L1PO test with individuals and classes distinguished by markers and colors, respectively.
  • Figure 5: Comparison of the networks' on-device inference execution for all three datasets Gym, Qvar and Ultra. for the three datasets RecGym, Qvar and Ultra. The networks deployed on GAP9 are run with 200MHz while the networks deployed on STM32F756ZG are run with 216MHz.
  • ...and 1 more figures