Table of Contents
Fetching ...

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving

Lixing Xiao, Ruixiao Shi, Xiaoyang Tang, Yi Zhou

TL;DR

This paper proposes a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner, which significantly improves recall for novel classes with lower training costs.

Abstract

Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving

TL;DR

This paper proposes a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner, which significantly improves recall for novel classes with lower training costs.

Abstract

Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.
Paper Structure (14 sections, 2 equations, 6 figures, 3 tables)

This paper contains 14 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An example of corner case. The category of 'traffic cone' is usually absent in most autonomous driving datasets.
  • Figure 2: The overview of MENOL framework. It consists of 4 stages. (i) Stage I: the RGB images from training dataset are firstly pre-processed by the off-the-shelf Omnidata model to extract the geometry cues. (ii) Stage II: the generated geometry cues images are used to train the objectness notion learner. (iii) Stage III: the trained objectness notion learner is used as the teacher model to generate pseudo boxes for the depth and normal images from another autonomous driving dataset. (iv) Stage IV: the pseudo-labeled depth and normal images are merged with fully annotated original RGB images and then fed into the DINO-based student model to train the final class-aware open-world object detector. Only student model is used for inference.
  • Figure 3: The t-SNE visualization of CODA li2022coda. Two common classes and two corner classes are chosen for visualization.
  • Figure 4: Detection result of our MENOL on CODA li2022coda dataset.
  • Figure 5: Comparison of our objectness notion learner with the proposal network of OLN kim2022learning on the CODA-val li2022coda dataset.
  • ...and 1 more figures