Table of Contents
Fetching ...

Temporal Micro-Doppler Spectrogram-based ViT Multiclass Target Classification

Nghia Thinh Nguyen, Tri Nhu Do

TL;DR

The paper addresses multiclass target classification in cluttered MDS data from mmWave FMCW radar by modeling temporal dynamics with a Vision Transformer. It introduces the Temporal MDS-ViT (T-MDS-ViT) that ingests stacked RVA slices via patch embeddings and cross-axis attention, augmented by mobility-aware constraints and Grad-CAM explainability. The approach achieves superior accuracy and data efficiency compared with CNN baselines, while maintaining real-time deployability, and provides interpretable attention maps that highlight high-energy motion regions in the MDS. This framework advances robust target discrimination under overlaps and occlusions, enabling practical radar-based sensing with cost-effective hardware.

Abstract

In this paper, we propose a new Temporal MDS-Vision Transformer (T-MDS-ViT) for multiclass target classification using millimeter-wave FMCW radar micro-Doppler spectrograms. Specifically, we design a transformer-based architecture that processes stacked range-velocity-angle (RVA) spatiotemporal tensors via patch embeddings and cross-axis attention mechanisms to explicitly model the sequential nature of MDS data across multiple frames. The T-MDS-ViT exploits mobility-aware constraints in its attention layer correspondences to maintain separability under target overlaps and partial occlusions. Next, we apply an explainable mechanism to examine how the attention layers focus on characteristic high-energy regions of the MDS representations and their effect on class-specific kinematic features. We also demonstrate that our proposed framework is superior to existing CNN-based methods in terms of classification accuracy while achieving better data efficiency and real-time deployability.

Temporal Micro-Doppler Spectrogram-based ViT Multiclass Target Classification

TL;DR

The paper addresses multiclass target classification in cluttered MDS data from mmWave FMCW radar by modeling temporal dynamics with a Vision Transformer. It introduces the Temporal MDS-ViT (T-MDS-ViT) that ingests stacked RVA slices via patch embeddings and cross-axis attention, augmented by mobility-aware constraints and Grad-CAM explainability. The approach achieves superior accuracy and data efficiency compared with CNN baselines, while maintaining real-time deployability, and provides interpretable attention maps that highlight high-energy motion regions in the MDS. This framework advances robust target discrimination under overlaps and occlusions, enabling practical radar-based sensing with cost-effective hardware.

Abstract

In this paper, we propose a new Temporal MDS-Vision Transformer (T-MDS-ViT) for multiclass target classification using millimeter-wave FMCW radar micro-Doppler spectrograms. Specifically, we design a transformer-based architecture that processes stacked range-velocity-angle (RVA) spatiotemporal tensors via patch embeddings and cross-axis attention mechanisms to explicitly model the sequential nature of MDS data across multiple frames. The T-MDS-ViT exploits mobility-aware constraints in its attention layer correspondences to maintain separability under target overlaps and partial occlusions. Next, we apply an explainable mechanism to examine how the attention layers focus on characteristic high-energy regions of the MDS representations and their effect on class-specific kinematic features. We also demonstrate that our proposed framework is superior to existing CNN-based methods in terms of classification accuracy while achieving better data efficiency and real-time deployability.

Paper Structure

This paper contains 29 sections, 21 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: The distribution of the MDS of the car class. Depending on different locations, speeds, and directions, the distribution of each target is different.
  • Figure 2: (a) Comparison of models based on the accuracy matrix given $\mathcal{D}_{\rm train}$ data. (b) Confusion matrix of $K_{\rm tar} = 3$ classes on $\mathcal{D}_{\rm test}$ data.
  • Figure 3: (a) MDS data of car. (b) Attention heatmap. (c) Overlay heatmap.