ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

Michael Kösel; Marcel Schreiber; Michael Ulrich; Claudius Gläser; Klaus Dietmayer

ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer

TL;DR

Aligned LiDAR representations for Out-Of-Distribution Detection is proposed, a novel approach that incorporates language representations from a vision-language model (VLM) that can treat the detection of OOD objects as a zero-shot classification task.

Abstract

LiDAR-based 3D object detection plays a critical role for reliable and safe autonomous driving systems. However, existing detectors often produce overly confident predictions for objects not belonging to known categories, posing significant safety risks. This is caused by so-called out-of-distribution (OOD) objects, which were not part of the training data, resulting in incorrect predictions. To address this challenge, we propose ALOOD (Aligned LiDAR representations for Out-Of-Distribution Detection), a novel approach that incorporates language representations from a vision-language model (VLM). By aligning the object features from the object detector to the feature space of the VLM, we can treat the detection of OOD objects as a zero-shot classification task. We demonstrate competitive performance on the nuScenes OOD benchmark, establishing a novel approach to OOD object detection in LiDAR using language representations. The source code is available at https://github.com/uulm-mrm/mmood3d.

ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

TL;DR

Abstract

Paper Structure (21 sections, 11 equations, 3 figures, 5 tables)

This paper contains 21 sections, 11 equations, 3 figures, 5 tables.

Introduction
Related Work
Out-of-Distribution Detection in Classification
OOD Detection in Object Detection
OOD Detection in 3D Object Detection
Vision-Language Models for Autonomous Driving
Method
Preliminaries
Feature Extraction
Modality Alignment
Inference
Experiments
Experimental Setup
Comparisons with State-of-the-art
Ablation Study
...and 6 more sections

Figures (3)

Figure 1: The object features of the LiDAR object detector are aligned to match the embeddings of a frozen text encoder. This allows us to use the zero-shot classification capabilities of VLMs to perform OOD detection.
Figure 2: Overview of our proposed ALOOD framework. Given a frozen LiDAR object detector, we extract features. These object-specific features are aligned to text features from a language model by generating text descriptions for each object. During inference, these aligned object features are compared to cached ID text embeddings using cosine-similarity.
Figure 3: Comparison of different OOD score distributions for different scoring methods. Including the object feature norm in the OOD score consistently leads to better results.

ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

TL;DR

Abstract

ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (3)