Table of Contents
Fetching ...

ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Yong Yue, Jeremy Smith, Eng Gee Lim, Yutao Yue

TL;DR

This work addresses robust waterway panoptic driving perception (PDP) under adverse conditions by fusing vision and 4D radar through Asymmetric Fair Fusion (AFF) modules and a Contextual Clustering backbone (VRCoC). ASY-VRNet treats image and radar features as irregular point sets and employs two fusion components, Image-Radar Concatenation (IRC) and Radar-Image Multiplication (RIM), to optimize object detection and semantic segmentation simultaneously. A homoscedastic uncertainty-based multi-task learning strategy balances the detection and segmentation losses, enabling effective joint optimization. Evaluated on the WaterScenes dataset, ASY-VRNet achieves state-of-the-art performance with fewer parameters and FLOPs, demonstrating improved robustness in challenging conditions and offering a plug-and-play fusion approach for multimodal PDP in waterway navigation.

Abstract

Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, most existing research has primarily focused on fusing visual and radar features dedicated to object detection or utilizing a shared feature space for multiple tasks, neglecting the individual representation differences between various tasks. To address this gap, we propose a pair of Asymmetric Fair Fusion (AFF) modules with favorable explainability designed to efficiently interact with independent features from both visual and radar modalities, tailored to the specific requirements of object detection and semantic segmentation tasks. The AFF modules treat image and radar maps as irregular point sets and transform these features into a crossed-shared feature space for multitasking, ensuring equitable treatment of vision and radar point cloud features. Leveraging AFF modules, we propose a novel and efficient PDP model, ASY-VRNet, which processes image and radar features based on irregular super-pixel point sets. Additionally, we propose an effective multitask learning method specifically designed for PDP models. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation on the WaterScenes benchmark. Our project is publicly available at https://github.com/GuanRunwei/ASY-VRNet.

ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

TL;DR

This work addresses robust waterway panoptic driving perception (PDP) under adverse conditions by fusing vision and 4D radar through Asymmetric Fair Fusion (AFF) modules and a Contextual Clustering backbone (VRCoC). ASY-VRNet treats image and radar features as irregular point sets and employs two fusion components, Image-Radar Concatenation (IRC) and Radar-Image Multiplication (RIM), to optimize object detection and semantic segmentation simultaneously. A homoscedastic uncertainty-based multi-task learning strategy balances the detection and segmentation losses, enabling effective joint optimization. Evaluated on the WaterScenes dataset, ASY-VRNet achieves state-of-the-art performance with fewer parameters and FLOPs, demonstrating improved robustness in challenging conditions and offering a plug-and-play fusion approach for multimodal PDP in waterway navigation.

Abstract

Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, most existing research has primarily focused on fusing visual and radar features dedicated to object detection or utilizing a shared feature space for multiple tasks, neglecting the individual representation differences between various tasks. To address this gap, we propose a pair of Asymmetric Fair Fusion (AFF) modules with favorable explainability designed to efficiently interact with independent features from both visual and radar modalities, tailored to the specific requirements of object detection and semantic segmentation tasks. The AFF modules treat image and radar maps as irregular point sets and transform these features into a crossed-shared feature space for multitasking, ensuring equitable treatment of vision and radar point cloud features. Leveraging AFF modules, we propose a novel and efficient PDP model, ASY-VRNet, which processes image and radar features based on irregular super-pixel point sets. Additionally, we propose an effective multitask learning method specifically designed for PDP models. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation on the WaterScenes benchmark. Our project is publicly available at https://github.com/GuanRunwei/ASY-VRNet.
Paper Structure (21 sections, 10 equations, 8 figures, 6 tables)

This paper contains 21 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The overview of our proposed methods. It contains five parts, USVs, sensors (monocular camera and 4D radar), data perceived by sensors, ASY-VRNet, multi-task training strategy and perception results.
  • Figure 2: Several challenging scenes in waterway perception: (a) dark environment, (b) camera malfunction, (c) strong light, (d) radar clutter, (e) adverse weather and (f) small objects.
  • Figure 3: The architecture of our proposed ASY-VRNet. It contains five parts, perception data, VRCoC, VRCoC-FPN, prediction heads and Asymmetric Fair Fusion modules (AFF), including RIM and IRC. Each stage of VRCoC has 2, 2, 6, 2 stacking blocks. VRCoC, AFF and VRCoC-FPN (Feature Pyramid Network) are three dedicated designed components in this paper.
  • Figure 4: The first stage of VRCoC, including image-like point sets (image and radar), point reducer and contextual clustering blocks.
  • Figure 5: The structure of Image-Radar Concatenation (IRC).
  • ...and 3 more figures