Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios
Youssef Shoeb, Nazir Nayal, Azarm Nowzad, Fatma Güney, Hanno Gottschalk
TL;DR
The paper tackles threshold-sensitive and fragmented road obstacle detection by moving from pixel-level OoD scoring to segment-level decisions leveraging Segment Anything Model priors. It formulates obstacle detection as a likelihood-ratio test between two learned distributions, $P_{free}$ and $P_{obstacle}$, and evaluates three distribution estimators—Gaussian Mixture Models, Normalizing Flows, and k-Nearest Neighbours—using SAM-derived segment features. The approach achieves state-of-the-art performance on component-level metrics on SMIYC benchmarks and Lost&Found without requiring a predefined threshold, with KNN often delivering the best results, though pixel-level performance trails behind some SOTA pixel-segmentation methods. This work demonstrates robust, threshold-free obstacle detection at the segment level, offering practical benefits for autonomous driving and highlighting avenues for improving small-object detection and reference-feature selection.
Abstract
Detecting road obstacles is essential for autonomous vehicles to navigate dynamic and complex traffic environments safely. Current road obstacle detection methods typically assign a score to each pixel and apply a threshold to generate final predictions. However, selecting an appropriate threshold is challenging, and the per-pixel classification approach often leads to fragmented predictions with numerous false positives. In this work, we propose a novel method that leverages segment-level features from visual foundation models and likelihood ratios to predict road obstacles directly. By focusing on segments rather than individual pixels, our approach enhances detection accuracy, reduces false positives, and offers increased robustness to scene variability. We benchmark our approach against existing methods on the RoadObstacle and LostAndFound datasets, achieving state-of-the-art performance without needing a predefined threshold.
