Table of Contents
Fetching ...

IMG2IMU: Translating Knowledge from Large-Scale Images to IMU Sensing Applications

Hyungjun Yoon, Hyeongheon Cha, Hoang C. Nguyen, Taesik Gong, Sung-Ju Lee

TL;DR

This work tackles the data scarcity in IMU sensing by importing knowledge from large-scale vision datasets. It converts triaxial IMU signals into spectrogram images and pre-trains a vision model using sensor-aware contrastive learning, notably via MoCo with a tailored InfoNCE loss. The resulting IMG2IMU framework demonstrates clear performance gains (average ~9.6 percentage points in macro-F1) across diverse IMU tasks with limited labeled data, and maintains feasible on-device latency. The study also provides insights into the importance of augmentation design and spectrogram-based representations for cross-modal transfer from vision to sensing, with strong evidence from Grad-CAM visualizations and ablation analyses.

Abstract

Pre-training representations acquired via self-supervised learning could achieve high accuracy on even tasks with small training data. Unlike in vision and natural language processing domains, pre-training for IMU-based applications is challenging, as there are few public datasets with sufficient size and diversity to learn generalizable representations. To overcome this problem, we propose IMG2IMU that adapts pre-trained representation from large-scale images to diverse IMU sensing tasks. We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision. We further present a sensor-aware pre-training method for images that enables models to acquire particularly impactful knowledge for IMU sensing applications. This involves using contrastive learning on our augmentation set customized for the properties of sensor data. Our evaluation with four different IMU sensing tasks shows that IMG2IMU outperforms the baselines pre-trained on sensor data by an average of 9.6%p F1-score, illustrating that vision knowledge can be usefully incorporated into IMU sensing applications where only limited training data is available.

IMG2IMU: Translating Knowledge from Large-Scale Images to IMU Sensing Applications

TL;DR

This work tackles the data scarcity in IMU sensing by importing knowledge from large-scale vision datasets. It converts triaxial IMU signals into spectrogram images and pre-trains a vision model using sensor-aware contrastive learning, notably via MoCo with a tailored InfoNCE loss. The resulting IMG2IMU framework demonstrates clear performance gains (average ~9.6 percentage points in macro-F1) across diverse IMU tasks with limited labeled data, and maintains feasible on-device latency. The study also provides insights into the importance of augmentation design and spectrogram-based representations for cross-modal transfer from vision to sensing, with strong evidence from Grad-CAM visualizations and ablation analyses.

Abstract

Pre-training representations acquired via self-supervised learning could achieve high accuracy on even tasks with small training data. Unlike in vision and natural language processing domains, pre-training for IMU-based applications is challenging, as there are few public datasets with sufficient size and diversity to learn generalizable representations. To overcome this problem, we propose IMG2IMU that adapts pre-trained representation from large-scale images to diverse IMU sensing tasks. We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision. We further present a sensor-aware pre-training method for images that enables models to acquire particularly impactful knowledge for IMU sensing applications. This involves using contrastive learning on our augmentation set customized for the properties of sensor data. Our evaluation with four different IMU sensing tasks shows that IMG2IMU outperforms the baselines pre-trained on sensor data by an average of 9.6%p F1-score, illustrating that vision knowledge can be usefully incorporated into IMU sensing applications where only limited training data is available.
Paper Structure (30 sections, 1 equation, 8 figures, 2 tables)

This paper contains 30 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Spectrogram images converted from sensor data of human activity recognition and roadway classification tasks.
  • Figure 2: Flipping and rotating an image from ImageNet (top) and a spectrogram image (bottom). Deformations misinterpret the spectrograms by swapping the time-frequency axes and inverting the values along an axis while preserving the label of the image from ImageNet.
  • Figure 3: Overview of IMG2IMU. (1) Using the large-scale image dataset collected from the public domain, pre-training is performed via contrastive learning with specially designed sensor-aware augmentations. (2) The pre-trained model is transferred to sensing tasks using 2D-transformed triaxial IMU sensor data as input.
  • Figure 4: Generation of a 3-channel 2D representation image from triaxial IMU sensing data.
  • Figure 5: Sensor-aware augmentations in IMG2IMU.
  • ...and 3 more figures