Table of Contents
Fetching ...

Availability-aware Sensor Fusion via Unified Canonical Space

Dong-Hee Paek, Seung-Hyun Kong

TL;DR

ASF addresses the availability challenge in multi-sensor fusion for autonomous driving by projecting camera, LiDAR, and 4D Radar features into a unified canonical space using Unified Canonical Projection (UCP) and fusing them with Cross-Attention Across Sensors Along Patches (CASAP). A Sensor Combination Loss (SCL) trains across all possible sensor configurations to improve robustness to degradation or failure. On the K-Radar dataset, ASF achieves state-of-the-art gains, e.g., $AP_{BEV}$ up to $+9.7\%$ and $AP_{3D}$ up to $+20.1\%$ at IoU=$0.5$, while maintaining real-time performance (approximately $20.5$ Hz with LiDAR+4D Radar and $13.5$ Hz with all three sensors) and low memory. This demonstrates strong resilience to adverse weather and sensor outages, supporting reliable perception in practical deployments.

Abstract

Sensor fusion of camera, LiDAR, and 4-dimensional (4D) Radar has brought a significant performance improvement in autonomous driving. However, there still exist fundamental challenges: deeply coupled fusion methods assume continuous sensor availability, making them vulnerable to sensor degradation and failure, whereas sensor-wise cross-attention fusion methods struggle with computational cost and unified feature representation. This paper presents availability-aware sensor fusion (ASF), a novel method that employs unified canonical projection (UCP) to enable consistency in all sensor features for fusion and cross-attention across sensors along patches (CASAP) to enhance robustness of sensor fusion against sensor degradation and failure. As a result, the proposed ASF shows a superior object detection performance to the existing state-of-the-art fusion methods under various weather and sensor degradation (or failure) conditions. Extensive experiments on the K-Radar dataset demonstrate that ASF achieves improvements of 9.7% in AP BEV (87.2%) and 20.1% in AP 3D (73.6%) in object detection at IoU=0.5, while requiring a low computational cost. All codes are available at https://github.com/kaist-avelab/k-radar.

Availability-aware Sensor Fusion via Unified Canonical Space

TL;DR

ASF addresses the availability challenge in multi-sensor fusion for autonomous driving by projecting camera, LiDAR, and 4D Radar features into a unified canonical space using Unified Canonical Projection (UCP) and fusing them with Cross-Attention Across Sensors Along Patches (CASAP). A Sensor Combination Loss (SCL) trains across all possible sensor configurations to improve robustness to degradation or failure. On the K-Radar dataset, ASF achieves state-of-the-art gains, e.g., up to and up to at IoU=, while maintaining real-time performance (approximately Hz with LiDAR+4D Radar and Hz with all three sensors) and low memory. This demonstrates strong resilience to adverse weather and sensor outages, supporting reliable perception in practical deployments.

Abstract

Sensor fusion of camera, LiDAR, and 4-dimensional (4D) Radar has brought a significant performance improvement in autonomous driving. However, there still exist fundamental challenges: deeply coupled fusion methods assume continuous sensor availability, making them vulnerable to sensor degradation and failure, whereas sensor-wise cross-attention fusion methods struggle with computational cost and unified feature representation. This paper presents availability-aware sensor fusion (ASF), a novel method that employs unified canonical projection (UCP) to enable consistency in all sensor features for fusion and cross-attention across sensors along patches (CASAP) to enhance robustness of sensor fusion against sensor degradation and failure. As a result, the proposed ASF shows a superior object detection performance to the existing state-of-the-art fusion methods under various weather and sensor degradation (or failure) conditions. Extensive experiments on the K-Radar dataset demonstrate that ASF achieves improvements of 9.7% in AP BEV (87.2%) and 20.1% in AP 3D (73.6%) in object detection at IoU=0.5, while requiring a low computational cost. All codes are available at https://github.com/kaist-avelab/k-radar.

Paper Structure

This paper contains 24 sections, 8 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Comparison of sensor fusion methods: (a) DCF (e.g., 3D-LRF lrf), (b) SCF (e.g., CMT cmt), and (c) ASF. FV, BEV, Obj., GT, $\mathcal{L}_{cls}$ and $\mathcal{L}_{reg}$ stand for 'front-view', 'bird's eye-view', 'objects', 'ground truths', 'classification loss', and 'regression loss', respectively. 'Feature coupling' refers to methods that combine features from multiple sensors to create new features. Optional components are in dashed lines; for example, pointpainting combines camera and LiDAR features without transforming the camera viewpoint, while cmt fuses features through a transformer decoder head detr without explicit feature coupling. ASF does not apply feature coupling to ensure independence between sensors.
  • Figure 2: Visualization of feature representation with t-SNE tsne at different stages of ASF for 'Sedan' class. Red, green, blue, and gray dots represent features from camera, LiDAR, 4D Radar, and fused features, respectively. Symbols in solid lines such as circle and triangle, square, and star indicate normal, sleet, and heavy snow conditions, respectively. (a) Initial output features from sensor-specific encoders show inconsistent distribution across sensors. (b) After unified canonical projection (UCP), features become better aligned to the fused feature. (c) After cross-attention across sensors along patches (CASAP), features from available sensors form cohesive clusters (in dashed symbols) based on weather conditions. Note that in adverse weather, camera features show larger deviation due to the degradation. Additional visualizations are in Appendix B.
  • Figure 3: Qualitative results of ASF for various sensor combinations. We show results for normal and adverse weather conditions in (a-i) and (j-r), respectively, where employed sensors are noted in the top-left corner (C: Camera, L: LiDAR, R: 4D Radar, C*: damaged camera). Each subplot visualizes front-view camera image, LiDAR point cloud, 4D Radar tensor, and a sensor attention map (SAM) showing attention score distribution from cross-attention in CASAP. In the SAMs, red, green, and blue represent attention scores for Camera, LiDAR, and 4D Radar, respectively. For example, a predominantly blue SAM indicates that 4D Radar receives the highest attention scores, meaning that 4D Radar is used primarily for detection in the scene. The bottom-left corner of each subplot shows the proportion of attention scores in colored percentages (C/L/R[%]). Note that predictions are visualized on all sensor data, even when a sensor is not employed for detection (e.g., predictions from L+R are also visualized on the camera image).
  • Figure 4: Overall sensor fusion framework for camera, LiDAR, and 4D Radar. 'FM' denotes 'feature map'.
  • Figure 5: Visualization of feature representations through t-SNE tsne at different stages of the proposed ASF for the 'Bus or Truck' class. 'UCP' and 'CASAP' denote unified canonical projection and cross-attention across sensors along patches, which are the two main components of the proposed ASF.
  • ...and 3 more figures