Table of Contents
Fetching ...

EgoLog: Ego-Centric Fine-Grained Daily Log with Ubiquitous Wearables

Lixing He, Bufang Yang, Di Duan, Zhenyu Yan, Guoliang Xing

TL;DR

EgoLog tackles the challenge of fine-grained daily logging with egocentric wearables by integrating audio and IMU data through temporal and spatial understanding, enhanced by scenario-aware HAR and LLM-assisted refinement. The approach demonstrates robust multimodal fusion, effective sound localization with movement compensation, and edge-cloud collaboration to improve scenario recognition while maintaining on-device efficiency. Evaluations on public (Ego4D, EgoADL, SAMOSA) and self-collected datasets show significant gains in activity (≈12%) and scenario recognition (≈15%), with favorable user-perceived usefulness and acceptable overhead. This work advances practical, privacy-conscious, long-term daily logging with ubiquitous wearables for healthcare and continuous monitoring.

Abstract

Despite advances in human activity recognition (HAR) with different modalities, a precise, robust, and accurate daily log system is not yet available. Current solutions primarily rely on controlled, lab-based data collection, which limits their real-world applicability. The challenges towards a fine-grained daily log are 1) contextual awareness, 2) spatial awareness, and 3) effective fusion of multi-modal sensor data. To solve them, we propose EgoLog, which integrates effective audio-IMU fusion for daily log with ubiquitous wearables. Our approach first fuses audio and IMU data from two perspectives: temporal understanding and spatial understanding. We extract scenario-level features and aggregate them in the time dimension, while using motion compensation to enhance the performance of sound source localization. The knowledge obtained from these steps is then integrated into a multi-modal HAR framework. Here, the scenario provides prior knowledge, and the spatial location helps differentiate the user from the background. Furthermore, we integrate a LLM to enhance scenario recognition through logical reasoning. The knowledge derived from the LLM is subsequently transferred back to the local device to enable efficient, on-device inference. Evaluated on both public and self-collected dataset, EgoLog achieves effective multimodal fusion for both activity and scenraio recognition, outperforms the baseline by 12% and 15%, respectively.

EgoLog: Ego-Centric Fine-Grained Daily Log with Ubiquitous Wearables

TL;DR

EgoLog tackles the challenge of fine-grained daily logging with egocentric wearables by integrating audio and IMU data through temporal and spatial understanding, enhanced by scenario-aware HAR and LLM-assisted refinement. The approach demonstrates robust multimodal fusion, effective sound localization with movement compensation, and edge-cloud collaboration to improve scenario recognition while maintaining on-device efficiency. Evaluations on public (Ego4D, EgoADL, SAMOSA) and self-collected datasets show significant gains in activity (≈12%) and scenario recognition (≈15%), with favorable user-perceived usefulness and acceptable overhead. This work advances practical, privacy-conscious, long-term daily logging with ubiquitous wearables for healthcare and continuous monitoring.

Abstract

Despite advances in human activity recognition (HAR) with different modalities, a precise, robust, and accurate daily log system is not yet available. Current solutions primarily rely on controlled, lab-based data collection, which limits their real-world applicability. The challenges towards a fine-grained daily log are 1) contextual awareness, 2) spatial awareness, and 3) effective fusion of multi-modal sensor data. To solve them, we propose EgoLog, which integrates effective audio-IMU fusion for daily log with ubiquitous wearables. Our approach first fuses audio and IMU data from two perspectives: temporal understanding and spatial understanding. We extract scenario-level features and aggregate them in the time dimension, while using motion compensation to enhance the performance of sound source localization. The knowledge obtained from these steps is then integrated into a multi-modal HAR framework. Here, the scenario provides prior knowledge, and the spatial location helps differentiate the user from the background. Furthermore, we integrate a LLM to enhance scenario recognition through logical reasoning. The knowledge derived from the LLM is subsequently transferred back to the local device to enable efficient, on-device inference. Evaluated on both public and self-collected dataset, EgoLog achieves effective multimodal fusion for both activity and scenraio recognition, outperforms the baseline by 12% and 15%, respectively.

Paper Structure

This paper contains 33 sections, 3 equations, 26 figures, 3 tables.

Figures (26)

  • Figure 1: EgoLog leverages microphones and an IMU on commercial mobile devices for both activity and scenario sensing.
  • Figure 2: Motivation for scenario understanding: The similarity of embeddings for intra- and inter-scenario activities.
  • Figure 3: Sound localization schmidt1986multiple under motion
  • Figure 4: Sound volume may not related to distance.
  • Figure 5: Motivation for spatial understanding: estimate the distance of a sound event by motion.
  • ...and 21 more figures