Table of Contents
Fetching ...

EgoCHARM: Resource-Efficient Hierarchical Activity Recognition using an Egocentric IMU Sensor

Akhil Padmanabha, Saravanan Govindarajan, Hwanmun Kim, Sergio Ortiz, Rahul Rajan, Doruk Senkal, Sneha Kadetotad

TL;DR

EgoCHARM introduces a resource-efficient hierarchical HAR framework that uses a single egocentric IMU to jointly recognize high level and low level activities. By training a small low level encoder in a semi-supervised fashion with predominantly high level labels and then probing its embeddings for low level classification, the approach achieves strong performance with a compact footprint suitable for on-device deployment. The method delivers HL F1 of 0.826 and LL F1 of 0.855 while maintaining only 21k–63k parameter scale and modest FLOPs, and it demonstrates robust sensitivity to data, sampling frequency, and window size. This work highlights the practical viability and limitations of egocentric IMU-based HAR for always-on smartglasses applications and points to future extensions with additional modalities and broader class coverage.

Abstract

Human activity recognition (HAR) on smartglasses has various use cases, including health/fitness tracking and input for context-aware AI assistants. However, current approaches for egocentric activity recognition suffer from low performance or are resource-intensive. In this work, we introduce a resource (memory, compute, power, sample) efficient machine learning algorithm, EgoCHARM, for recognizing both high level and low level activities using a single egocentric (head-mounted) Inertial Measurement Unit (IMU). Our hierarchical algorithm employs a semi-supervised learning strategy, requiring primarily high level activity labels for training, to learn generalizable low level motion embeddings that can be effectively utilized for low level activity recognition. We evaluate our method on 9 high level and 3 low level activities achieving 0.826 and 0.855 F1 scores on high level and low level activity recognition respectively, with just 63k high level and 22k low level model parameters, allowing the low level encoder to be deployed directly on current IMU chips with compute. Lastly, we present results and insights from a sensitivity analysis and highlight the opportunities and limitations of activity recognition using egocentric IMUs.

EgoCHARM: Resource-Efficient Hierarchical Activity Recognition using an Egocentric IMU Sensor

TL;DR

EgoCHARM introduces a resource-efficient hierarchical HAR framework that uses a single egocentric IMU to jointly recognize high level and low level activities. By training a small low level encoder in a semi-supervised fashion with predominantly high level labels and then probing its embeddings for low level classification, the approach achieves strong performance with a compact footprint suitable for on-device deployment. The method delivers HL F1 of 0.826 and LL F1 of 0.855 while maintaining only 21k–63k parameter scale and modest FLOPs, and it demonstrates robust sensitivity to data, sampling frequency, and window size. This work highlights the practical viability and limitations of egocentric IMU-based HAR for always-on smartglasses applications and points to future extensions with additional modalities and broader class coverage.

Abstract

Human activity recognition (HAR) on smartglasses has various use cases, including health/fitness tracking and input for context-aware AI assistants. However, current approaches for egocentric activity recognition suffer from low performance or are resource-intensive. In this work, we introduce a resource (memory, compute, power, sample) efficient machine learning algorithm, EgoCHARM, for recognizing both high level and low level activities using a single egocentric (head-mounted) Inertial Measurement Unit (IMU). Our hierarchical algorithm employs a semi-supervised learning strategy, requiring primarily high level activity labels for training, to learn generalizable low level motion embeddings that can be effectively utilized for low level activity recognition. We evaluate our method on 9 high level and 3 low level activities achieving 0.826 and 0.855 F1 scores on high level and low level activity recognition respectively, with just 63k high level and 22k low level model parameters, allowing the low level encoder to be deployed directly on current IMU chips with compute. Lastly, we present results and insights from a sensitivity analysis and highlight the opportunities and limitations of activity recognition using egocentric IMUs.

Paper Structure

This paper contains 26 sections, 10 figures, 6 tables.

Figures (10)

  • Figure 1: EgoCHARM Low Level Encoder Architecture. Our encoder consists of 1D-CNN (Convolutional Neural Network) layers with variable dilation to capture the periodic patterns present in IMU signals and a GRU (Gated Recurrent Unit) for capturing temporal sequences.
  • Figure 2: Low Level Encoder Probing. To enable low level activity recognition, we freeze our low level encoder's parameters and train a probing layer to map our low level motion embeddings to 3 discrete classes.
  • Figure 3: High Level Activity Recognition Confusion Matrix using EgoCHARM. Values in the confusion matrix are in percentage.
  • Figure 4: Principal component analysis (PCA) on unseen low level activity motion embeddings using EgoCHARM low level encoder trained using only high level activity labels. Distinct clusters are seen for the three low level classes.
  • Figure 5: EgoCHARM Sensitivity Analysis. For all plots, the dotted grey line represents the best F1 score for our EgoCHARM model, presented in Table \ref{['tab:results_table']}, with all high and low level samples, 50 Hz sampling frequency, and 30s high level window size. A. The effect of the number of 30s samples per class on high level activity classification performance. B. The effect of the number of 1s samples per class on low level activity classification performance. We average results across 4 folds as described in Section \ref{['sec:probing']}. C. The effect of IMU sampling frequency on high level activity classification performance. D. The effect of high level window size (s) on high level activity classification performance.
  • ...and 5 more figures