SelaFD:Seamless Adaptation of Vision Transformer Fine-tuning for Radar-based Human Activity Recognition

Yijun Wang; Yong Wang; Chendong xu; Shuai Yao; Qisong Wu

SelaFD:Seamless Adaptation of Vision Transformer Fine-tuning for Radar-based Human Activity Recognition

Yijun Wang, Yong Wang, Chendong xu, Shuai Yao, Qisong Wu

TL;DR

This work addresses fall-detection and related HAR tasks using radar Time-Doppler spectrograms and the challenge of transferring pre-trained Vision Transformer representations from natural images to radar data. It introduces SelaFD, a joint fine-tuning framework that combines LoRA weight-space updates with a serial-parallel adapter in the feature space to capture both coarse- and fine-grained patterns while keeping the backbone frozen. On the public UOG radar HAR dataset, SelaFD achieves 96.61% accuracy, outperforming recent CNN- and ViT-based approaches and providing evidence that parameter-efficient adaptation can bridge modality gaps. The approach demonstrates the viability of ViT-based, privacy-preserving radar HAR and offers a generalizable pathway for integrating radar signals with large-scale visual or multimodal models in real-world monitoring tasks.

Abstract

Human Activity Recognition (HAR) such as fall detection has become increasingly critical due to the aging population, necessitating effective monitoring systems to prevent serious injuries and fatalities associated with falls. This study focuses on fine-tuning the Vision Transformer (ViT) model specifically for HAR using radar-based Time-Doppler signatures. Unlike traditional image datasets, these signals present unique challenges due to their non-visual nature and the high degree of similarity among various activities. Directly fine-tuning the ViT with all parameters proves suboptimal for this application. To address this challenge, we propose a novel approach that employs Low-Rank Adaptation (LoRA) fine-tuning in the weight space to facilitate knowledge transfer from pre-trained ViT models. Additionally, to extract fine-grained features, we enhance feature representation through the integration of a serial-parallel adapter in the feature space. Our innovative joint fine-tuning method, tailored for radar-based Time-Doppler signatures, significantly improves HAR accuracy, surpassing existing state-of-the-art methodologies in this domain. Our code is released at https://github.com/wangyijunlyy/SelaFD.

SelaFD:Seamless Adaptation of Vision Transformer Fine-tuning for Radar-based Human Activity Recognition

TL;DR

Abstract

SelaFD:Seamless Adaptation of Vision Transformer Fine-tuning for Radar-based Human Activity Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)