Table of Contents
Fetching ...

MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

Qihang Yang, Yang Zhao, Hong Cheng

TL;DR

This paper introduces a pioneering Multi-modal Multi-class Late Fusion (MMLF) method, which is designed for late fusion to enable multi-class detection and incorporates uncertainty analysis into the classification fusion process, which renders the model more transparent and trustworthy, providing more reliable insights into category predictions.

Abstract

Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.

MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

TL;DR

This paper introduces a pioneering Multi-modal Multi-class Late Fusion (MMLF) method, which is designed for late fusion to enable multi-class detection and incorporates uncertainty analysis into the classification fusion process, which renders the model more transparent and trustworthy, providing more reliable insights into category predictions.

Abstract

Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.

Paper Structure

This paper contains 28 sections, 8 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: System architecture. In step 1, Each of the $m$ 3D candidates is computed for IOU with each of the $n$ 2D candidates to have $k$ hypothetical fused pairs, and fused class features with uncertainty are obtained based on these pairs. In step 2, the hypothetical objective scores are computed by a 2D CNN and then concatenated with the fused class features with uncertainty which ultimately is used in step 3 to build fused matching tensor $\mathbf{M^F}$ to get the final fused prediction
  • Figure 2: Example of uncertainty filtering.
  • Figure 3: Visualized results of our MMLF on KITTI validation set. In the 3D BEV representation, the numbers adjacent to the bounding boxes indicate uncertainty. Our model has significantly reduced the original detection uncertainty and demonstrated an improvement in detection capability. (a) Demonstrates the enhanced detection capability of our fused model in the presence of occluded objects. (b) Illustrates the improved detection ability of our model for small objects. (c) Indicates that our model partially addresses the issue of weak long-range object detection inherent in 3D detectors.