Table of Contents
Fetching ...

Cross-modal semantic segmentation for indoor environmental perception using single-chip millimeter-wave radar raw data

Hairuo Hu, Haiyong Cong, Zhuyu Shao, Yubo Bi, Jinghao Liu

TL;DR

Indoor firefighting perception is challenging under smoke and dim conditions, where cameras and LiDAR underperform. The authors propose a cross-modal semantic segmentation framework using a single-chip mmWave radar, paired with an automatic LiDAR- and occupancy-grid labeling workflow and a lightweight U-Net augmented with a spatial attention module to segment the unobstructed field of view in BEV. The ColoRadar dataset enables training with automatic ground-truth generation, and RD-tensor inputs are shown to deliver the best segmentation performance, while ADC inputs perform poorly and RA inputs are feasible but slightly inferior. The work demonstrates that cross-modal segmentation can provide intuitive, robust indoor environmental perception suitable for real-time rescue workflows and sets the stage for future extensions to smoke conditions and SLAM-based navigation.

Abstract

In the context of firefighting and rescue operations, a cross-modal semantic segmentation model based on a single-chip millimeter-wave (mmWave) radar for indoor environmental perception is proposed and discussed. To efficiently obtain high-quality labels, an automatic label generation method utilizing LiDAR point clouds and occupancy grid maps is introduced. The proposed segmentation model is based on U-Net. A spatial attention module is incorporated, which enhanced the performance of the mode. The results demonstrate that cross-modal semantic segmentation provides a more intuitive and accurate representation of indoor environments. Unlike traditional methods, the model's segmentation performance is minimally affected by azimuth. Although performance declines with increasing distance, this can be mitigated by a well-designed model. Additionally, it was found that using raw ADC data as input is ineffective; compared to RA tensors, RD tensors are more suitable for the proposed model.

Cross-modal semantic segmentation for indoor environmental perception using single-chip millimeter-wave radar raw data

TL;DR

Indoor firefighting perception is challenging under smoke and dim conditions, where cameras and LiDAR underperform. The authors propose a cross-modal semantic segmentation framework using a single-chip mmWave radar, paired with an automatic LiDAR- and occupancy-grid labeling workflow and a lightweight U-Net augmented with a spatial attention module to segment the unobstructed field of view in BEV. The ColoRadar dataset enables training with automatic ground-truth generation, and RD-tensor inputs are shown to deliver the best segmentation performance, while ADC inputs perform poorly and RA inputs are feasible but slightly inferior. The work demonstrates that cross-modal segmentation can provide intuitive, robust indoor environmental perception suitable for real-time rescue workflows and sets the stage for future extensions to smoke conditions and SLAM-based navigation.

Abstract

In the context of firefighting and rescue operations, a cross-modal semantic segmentation model based on a single-chip millimeter-wave (mmWave) radar for indoor environmental perception is proposed and discussed. To efficiently obtain high-quality labels, an automatic label generation method utilizing LiDAR point clouds and occupancy grid maps is introduced. The proposed segmentation model is based on U-Net. A spatial attention module is incorporated, which enhanced the performance of the mode. The results demonstrate that cross-modal semantic segmentation provides a more intuitive and accurate representation of indoor environments. Unlike traditional methods, the model's segmentation performance is minimally affected by azimuth. Although performance declines with increasing distance, this can be mitigated by a well-designed model. Additionally, it was found that using raw ADC data as input is ineffective; compared to RA tensors, RD tensors are more suitable for the proposed model.

Paper Structure

This paper contains 12 sections, 13 equations, 11 figures.

Figures (11)

  • Figure 1: LiDAR point cloud global maps for 4 different scenarios ref31
  • Figure 2: The generation process of the ground truth labels
  • Figure 3: Architecture of the semantic segmentation mode
  • Figure 4: Comparison between conventional point cloud based and semantic segmentation methods
  • Figure 5: Comparative analysis of the proposed model (baseline) against models with CAM module and Swin transformer block across various performance metrics: (a) accuracy, (b) precision, (c) recall, (d) F1 score, (e) IoU and (f) FAR
  • ...and 6 more figures