Table of Contents
Fetching ...

IMUDiffusion: A Diffusion Model for Multivariate Time Series Synthetisation for Inertial Motion Capturing Systems

Heiko Oppel, Michael Munz

TL;DR

IMUDiffusion proposes a diffusion-based framework for synthesizing multivariate IMU time series in the frequency domain to augment inertial motion capture HAR datasets. By adapting a UNet-like DDPM to per-sensor schedules and frequency-domain representations, the approach generates realistic synthetic sequences that, when used to train a classifier, significantly improves macro F1 scores under leave-one-subject-out validation, sometimes achieving near-perfect accuracy. The study demonstrates qualitative agreement between real and synthetic data via UMAP and DTW/DBA analyses, and provides a detailed account of dataset processing, model architecture, training, and evaluation. Overall, IMUDiffusion offers a promising data-augmentation tool for HAR in data-scarce scenarios, though it incurs substantial computational costs and shows participant-specific variability that warrants further investigation.

Abstract

Kinematic sensors are often used to analyze movement behaviors in sports and daily activities due to their ease of use and lack of spatial restrictions, unlike video-based motion capturing systems. Still, the generation, and especially the labeling of motion data for specific activities can be time-consuming and costly. Additionally, many models struggle with limited data, which limits their performance in recognizing complex movement patterns. To address those issues, generating synthetic data can help expand the diversity and variability. In this work, we propose IMUDiffusion, a probabilistic diffusion model specifically designed for multivariate time series generation. Our approach enables the generation of high-quality time series sequences which accurately capture the dynamics of human activities. Moreover, by joining our dataset with synthetic data, we achieve a significant improvement in the performance of our baseline human activity classifier. In some cases, we are able to improve the macro F1-score by almost 30%. IMUDiffusion provides a valuable tool for generating realistic human activity movements and enhance the robustness of models in scenarios with limited training data.

IMUDiffusion: A Diffusion Model for Multivariate Time Series Synthetisation for Inertial Motion Capturing Systems

TL;DR

IMUDiffusion proposes a diffusion-based framework for synthesizing multivariate IMU time series in the frequency domain to augment inertial motion capture HAR datasets. By adapting a UNet-like DDPM to per-sensor schedules and frequency-domain representations, the approach generates realistic synthetic sequences that, when used to train a classifier, significantly improves macro F1 scores under leave-one-subject-out validation, sometimes achieving near-perfect accuracy. The study demonstrates qualitative agreement between real and synthetic data via UMAP and DTW/DBA analyses, and provides a detailed account of dataset processing, model architecture, training, and evaluation. Overall, IMUDiffusion offers a promising data-augmentation tool for HAR in data-scarce scenarios, though it incurs substantial computational costs and shows participant-specific variability that warrants further investigation.

Abstract

Kinematic sensors are often used to analyze movement behaviors in sports and daily activities due to their ease of use and lack of spatial restrictions, unlike video-based motion capturing systems. Still, the generation, and especially the labeling of motion data for specific activities can be time-consuming and costly. Additionally, many models struggle with limited data, which limits their performance in recognizing complex movement patterns. To address those issues, generating synthetic data can help expand the diversity and variability. In this work, we propose IMUDiffusion, a probabilistic diffusion model specifically designed for multivariate time series generation. Our approach enables the generation of high-quality time series sequences which accurately capture the dynamics of human activities. Moreover, by joining our dataset with synthetic data, we achieve a significant improvement in the performance of our baseline human activity classifier. In some cases, we are able to improve the macro F1-score by almost 30%. IMUDiffusion provides a valuable tool for generating realistic human activity movements and enhance the robustness of models in scenarios with limited training data.

Paper Structure

This paper contains 25 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Sequenced sensor information in their respective SI-units separated by activity and sensor axis. The sequences were cut to $160$ timesteps with a shift of $40$ steps. Each activity contains at least three repetitive movements of the same activity. The graph visualizes the average over all participants.
  • Figure 2: The network architecture of the IMUDiffusion model. It consists of three blocks: a Down, Mid and Up block. Each block is build around two ResNet and two Multihead Self-Attention blocks connected in serial.
  • Figure 3: Comparison of synthetic and real sequences from the Cycling class using UMAP (a). Two cluster are formed by the synthetic sequences whereas the real sequences spread individually across the dimension space. The subgraphs (b)-(g) visualize one of the $20$ clusters from the Cycling class using kMeans clustering for time series including the DTW barycenter average for the real and synthetic sequences separately. Each subgraph visualizes eight randomly chosen sequences belonging to the pre-chosen cluster. They also represent one of the six axes of the IMU separately. The bold lines belong to the DTW barycenter average of the real and synthetic sequences respectively.
  • Figure 4: Visualization of the denoising process on an example from the Cycling class. The background within the images show the transformed sequence into the frequency domain. The black lines visualize the corresponding sequence in the time domain. The denoising process is separated by the IMU axes.
  • Figure 5: Classification results from LOSOCV with and without synthetic sequences in the training set. Compared are the two baseline classifier 2 Sample and Full-Set against the classifier which is trained on synthetic and real sequences (2 Sample Full Synth). The results were generated only on the real sequences from the hold out test participant. Graph (a) shows the makro F1 score in the form of a swarm plot combined with a violin plot. The PIDs are representative for the participant IDs that reached a score value of less than $1.0$. (b), (c) and (d) show the confusion matrices from the Full-Set, 2 Sample and 2 Sample Full Synth classifier respectively. It includes the average amount of samples and the standard deviation over all participants. The value in the brackets represents the amount of participants affected by the cell compared to all $12$ participants.
  • ...and 4 more figures