Table of Contents
Fetching ...

RadarFuseNet: Complex-Valued Cross-Attention Fusion of Time-Frequency IQ Radar Features for Robust Classification

Stefan Hägele, Adam Misik, Eckehard Steinbach

TL;DR

A bidirectional cross-attention fusion network that combines IQ signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs) is proposed, improving occluded object classification and material classification and underscoring the benefit of the proposed fusion strategy.

Abstract

Millimeter-wave (mmWave) radar has emerged as a compact and powerful sensing modality for advanced perception tasks that leverage machine learning. It is particularly effective in scenarios where vision-based sensors fail to capture reliable information, such as detecting occluded objects or distinguishing between different surface materials in indoor environments. Due to the nonlinear characteristics of mmWave radar signals, deep learning-based methods are well suited for extracting relevant information from in-phase and quadrature (IQ) data. However, the current state of the art in IQ signal-based occluded-object and material classification still offers substantial potential for further improvement. In this paper, we propose a bidirectional cross-attention fusion network that combines IQ signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs). In our experiments, we achieve a material classification accuracy of 99.92% on samples collected at the same sensor distances used during training, and an accuracy of 65.56% on samples measured at previously unseen distances, demonstrating improved generalization across varying measurement conditions. Furthermore, our approach improves occluded object classification to 94.20%, outperforming all comparison and ablation models and underscoring the benefit of the proposed fusion strategy.

RadarFuseNet: Complex-Valued Cross-Attention Fusion of Time-Frequency IQ Radar Features for Robust Classification

TL;DR

A bidirectional cross-attention fusion network that combines IQ signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs) is proposed, improving occluded object classification and material classification and underscoring the benefit of the proposed fusion strategy.

Abstract

Millimeter-wave (mmWave) radar has emerged as a compact and powerful sensing modality for advanced perception tasks that leverage machine learning. It is particularly effective in scenarios where vision-based sensors fail to capture reliable information, such as detecting occluded objects or distinguishing between different surface materials in indoor environments. Due to the nonlinear characteristics of mmWave radar signals, deep learning-based methods are well suited for extracting relevant information from in-phase and quadrature (IQ) data. However, the current state of the art in IQ signal-based occluded-object and material classification still offers substantial potential for further improvement. In this paper, we propose a bidirectional cross-attention fusion network that combines IQ signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs). In our experiments, we achieve a material classification accuracy of 99.92% on samples collected at the same sensor distances used during training, and an accuracy of 65.56% on samples measured at previously unseen distances, demonstrating improved generalization across varying measurement conditions. Furthermore, our approach improves occluded object classification to 94.20%, outperforming all comparison and ablation models and underscoring the benefit of the proposed fusion strategy.

Paper Structure

This paper contains 7 sections, 10 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Dimensions of the raw IQ signal (left) and its FFT-transformed counterpart (right).
  • Figure 2: Model architecture including two complex-valued CNNs for IQ- ($\boldsymbol{f}$) and FFT ($\boldsymbol{F}$) feature extraction, and a multi-head cross-attention mechanism to fuse the extracted features.
  • Figure 3: Design of the single-branch complex-valued CNN feature extractor with complex-valued phasor as input.
  • Figure 4: Setups for surface material data collection (left) and occluded object data collection (right) smcnetoccnet.
  • Figure 5: RadarFuseNet confusion matrix for occluded objects with overall 94.20% accuracy.