Table of Contents
Fetching ...

Diffusion Model-based Contrastive Learning for Human Activity Recognition

Chunjing Xiao, Yanhui Han, Wei Yang, Yane Hou, Fangzhan Shi, Kevin Chetty

TL;DR

This work tackles the generalization gap in WiFi CSI-based activity recognition caused by subject variability in motion habits. It introduces CLAR, a diffusion-model-based contrastive learning framework that combines a DDPM-based time-series augmentation module with an adaptive weighting strategy for positive sample pairs. The augmentation decomposes reference signals into high- and low-frequency components and injects them with step-dependent weights during diffusion to synthesize plausible new motion patterns, while adaptive weighting uses Dynamic Time Warping–based activity content estimates to emphasize informative positive pairs. Experiments on SignFi and DeepSeg with limited labeled data show that CLAR consistently outperforms state-of-the-art baselines, validating its effectiveness and potential for practical wireless sensing applications.

Abstract

WiFi Channel State Information (CSI)-based activity recognition has sparked numerous studies due to its widespread availability and privacy protection. However, when applied in practical applications, general CSI-based recognition models may face challenges related to the limited generalization capability, since individuals with different behavior habits will cause various fluctuations in CSI data and it is difficult to gather enough training data to cover all kinds of motion habits. To tackle this problem, we design a diffusion model-based Contrastive Learning framework for human Activity Recognition (CLAR) using WiFi CSI. On the basis of the contrastive learning framework, we primarily introduce two components for CLAR to enhance CSI-based activity recognition. To generate diverse augmented data and complement limited training data, we propose a diffusion model-based time series-specific augmentation model. In contrast to typical diffusion models that directly apply conditions to the generative process, potentially resulting in distorted CSI data, our tailored model dissects these condition into the high-frequency and low-frequency components, and then applies these conditions to the generative process with varying weights. This can alleviate data distortion and yield high-quality augmented data. To efficiently capture the difference of the sample importance, we present an adaptive weight algorithm. Different from typical contrastive learning methods which equally consider all the training samples, this algorithm adaptively adjusts the weights of positive sample pairs for learning better data representations. The experiments suggest that CLAR achieves significant gains compared to state-of-the-art methods.

Diffusion Model-based Contrastive Learning for Human Activity Recognition

TL;DR

This work tackles the generalization gap in WiFi CSI-based activity recognition caused by subject variability in motion habits. It introduces CLAR, a diffusion-model-based contrastive learning framework that combines a DDPM-based time-series augmentation module with an adaptive weighting strategy for positive sample pairs. The augmentation decomposes reference signals into high- and low-frequency components and injects them with step-dependent weights during diffusion to synthesize plausible new motion patterns, while adaptive weighting uses Dynamic Time Warping–based activity content estimates to emphasize informative positive pairs. Experiments on SignFi and DeepSeg with limited labeled data show that CLAR consistently outperforms state-of-the-art baselines, validating its effectiveness and potential for practical wireless sensing applications.

Abstract

WiFi Channel State Information (CSI)-based activity recognition has sparked numerous studies due to its widespread availability and privacy protection. However, when applied in practical applications, general CSI-based recognition models may face challenges related to the limited generalization capability, since individuals with different behavior habits will cause various fluctuations in CSI data and it is difficult to gather enough training data to cover all kinds of motion habits. To tackle this problem, we design a diffusion model-based Contrastive Learning framework for human Activity Recognition (CLAR) using WiFi CSI. On the basis of the contrastive learning framework, we primarily introduce two components for CLAR to enhance CSI-based activity recognition. To generate diverse augmented data and complement limited training data, we propose a diffusion model-based time series-specific augmentation model. In contrast to typical diffusion models that directly apply conditions to the generative process, potentially resulting in distorted CSI data, our tailored model dissects these condition into the high-frequency and low-frequency components, and then applies these conditions to the generative process with varying weights. This can alleviate data distortion and yield high-quality augmented data. To efficiently capture the difference of the sample importance, we present an adaptive weight algorithm. Different from typical contrastive learning methods which equally consider all the training samples, this algorithm adaptively adjusts the weights of positive sample pairs for learning better data representations. The experiments suggest that CLAR achieves significant gains compared to state-of-the-art methods.
Paper Structure (19 sections, 19 equations, 11 figures, 1 table)

This paper contains 19 sections, 19 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Augmented data by different methods. The orange line denotes the augmented data and the blue line refers to the real one. (a) The waveform of augmented data by Gaussian blur (orange) is almost the same to the original one (blue). (b) The waveform of augmented data by our DDPM-based augmentation method (orange) can combine the characteristics of the two samples (solid and dotted blue).
  • Figure 2: Positive sample pairs extracted from an activity where the dotted red lines are the start and end points and there is pause near the center. Compared to positive pair ($x_3$, $x_4$), positive pair ($x_1$, $x_2$) should provide fewer clues for learning data representation because they contain more pause data.
  • Figure 3: CLAR framework. During the training process, the reference and source samples are fed into our designed DDPM-based augmentation model to generate augmented data with new characteristics. These augmented data are further processed by cropping and resizing to build the contrastive loss. Meanwhile, the weight of each sample pair is computed by our devised adaptive weight algorithm and further is incorporated into the contrastive loss to enhance model performance.
  • Figure 4: The DDPM-based data augmented model. Red arrows $\color{red} \to$ indicate the forward diffusion, and blue ones $\color{blue} \to$ refer to the reverse diffusion.
  • Figure 5: The activity recognition performance for SignFi data.
  • ...and 6 more figures