Table of Contents
Fetching ...

Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios

Youssef Shoeb, Nazir Nayal, Azarm Nowzad, Fatma Güney, Hanno Gottschalk

TL;DR

The paper tackles threshold-sensitive and fragmented road obstacle detection by moving from pixel-level OoD scoring to segment-level decisions leveraging Segment Anything Model priors. It formulates obstacle detection as a likelihood-ratio test between two learned distributions, $P_{free}$ and $P_{obstacle}$, and evaluates three distribution estimators—Gaussian Mixture Models, Normalizing Flows, and k-Nearest Neighbours—using SAM-derived segment features. The approach achieves state-of-the-art performance on component-level metrics on SMIYC benchmarks and Lost&Found without requiring a predefined threshold, with KNN often delivering the best results, though pixel-level performance trails behind some SOTA pixel-segmentation methods. This work demonstrates robust, threshold-free obstacle detection at the segment level, offering practical benefits for autonomous driving and highlighting avenues for improving small-object detection and reference-feature selection.

Abstract

Detecting road obstacles is essential for autonomous vehicles to navigate dynamic and complex traffic environments safely. Current road obstacle detection methods typically assign a score to each pixel and apply a threshold to generate final predictions. However, selecting an appropriate threshold is challenging, and the per-pixel classification approach often leads to fragmented predictions with numerous false positives. In this work, we propose a novel method that leverages segment-level features from visual foundation models and likelihood ratios to predict road obstacles directly. By focusing on segments rather than individual pixels, our approach enhances detection accuracy, reduces false positives, and offers increased robustness to scene variability. We benchmark our approach against existing methods on the RoadObstacle and LostAndFound datasets, achieving state-of-the-art performance without needing a predefined threshold.

Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios

TL;DR

The paper tackles threshold-sensitive and fragmented road obstacle detection by moving from pixel-level OoD scoring to segment-level decisions leveraging Segment Anything Model priors. It formulates obstacle detection as a likelihood-ratio test between two learned distributions, and , and evaluates three distribution estimators—Gaussian Mixture Models, Normalizing Flows, and k-Nearest Neighbours—using SAM-derived segment features. The approach achieves state-of-the-art performance on component-level metrics on SMIYC benchmarks and Lost&Found without requiring a predefined threshold, with KNN often delivering the best results, though pixel-level performance trails behind some SOTA pixel-segmentation methods. This work demonstrates robust, threshold-free obstacle detection at the segment level, offering practical benefits for autonomous driving and highlighting avenues for improving small-object detection and reference-feature selection.

Abstract

Detecting road obstacles is essential for autonomous vehicles to navigate dynamic and complex traffic environments safely. Current road obstacle detection methods typically assign a score to each pixel and apply a threshold to generate final predictions. However, selecting an appropriate threshold is challenging, and the per-pixel classification approach often leads to fragmented predictions with numerous false positives. In this work, we propose a novel method that leverages segment-level features from visual foundation models and likelihood ratios to predict road obstacles directly. By focusing on segments rather than individual pixels, our approach enhances detection accuracy, reduces false positives, and offers increased robustness to scene variability. We benchmark our approach against existing methods on the RoadObstacle and LostAndFound datasets, achieving state-of-the-art performance without needing a predefined threshold.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Road Obstacle Segmentation Overview. From the input image (anomaly highlighted with a green box), current SOTA per-pixel methods (e.g., UEM nayal2024likelihoodratiobasedapproachsegmenting) produce high anomaly scores for unknown objects (column two), but when a threshold is applied the output is fragmented with multiple false positives or with false negatives if the threshold is set too low or too high (column three and four). SAM produces high-quality segment masks for all image segments but lacks semantic information. Our method (column five) uses the object priors used in SAM to learn the semantic distribution of the segments and detect the road obstacle segments based on the likelihood ratios.
  • Figure 2: Approach For Segment-Level Road Obstacle Detection: Our approach for road obstacle detection uses visual foundation models like SAM Kirillov_2023_ICCV to generate segment-level masks. The segment-level feature representations are obtained from the transformer decoder layer, which processes the image and prompts embeddings. During inference, we generate masks for the entire image using a grid of point prompts over the image and filter low-quality and duplicate masks outside the region of interest. For each remaining mask, we compute the likelihood ratios of these learned representations to produce final predictions using two learned estimates trained to estimate free space and obstacles.
  • Figure 3: Failure Case Examples: The left column shows the input image with the road obstacles highlighted in green bounding boxes, and the right column shows scenarios where the masks generated by SAM miss detecting the road obstacle as a separate segment.
  • Figure 4: Comparison of Gaussian Mixture Models (first row) Normalizing Flows (second row), and K-nearest neighbours (third row) on the training set. The first column visualizes the learned distributions of the free-space model, the second visualizes the learned distributions of obstacles, and the third visualizes the likelihood ratio between both. The likelihood ratio provides better separation than any of the models separately at the threshold value 1.