Robust Video-Based Pothole Detection and Area Estimation for Intelligent Vehicles with Depth Map and Kalman Smoothing
Dehao Wang, Haohang Zhu, Yiwen Xu, Kaiqi Liu
TL;DR
The paper tackles pothole area estimation for autonomous driving, addressing sensitivity to camera angle and flat-road assumptions in prior approaches. It presents a fully vision-based pipeline that fuses monocular depth maps with an improved pothole detector (ACSH-YOLOv8), a robust tracker (BoT-SORT), and a novel MBTP area estimator, followed by Kalman smoothing across frames (CDKF). Key contributions include the ACSH-YOLOv8 architecture with a P2 head and ACmix, the MBTP method that integrates depth with a minimum bounding rectangle and pixel-level triangular facets, and the CDKF algorithm that adaptively weights measurement noise by detection confidence and camera distance using Bayesian-tuned parameters. Experimental results show improved detection of small potholes and more stable, accurate pothole-area estimates, demonstrating practical viability for real-time autonomous driving.
Abstract
Road potholes pose a serious threat to driving safety and comfort, making their detection and assessment a critical task in fields such as autonomous driving. When driving vehicles, the operators usually avoid large potholes and approach smaller ones at reduced speeds to ensure safety. Therefore, accurately estimating pothole area is of vital importance. Most existing vision-based methods rely on distance priors to construct geometric models. However, their performance is susceptible to variations in camera angles and typically relies on the assumption of a flat road surface, potentially leading to significant errors in complex real-world environments. To address these problems, a robust pothole area estimation framework that integrates object detection and monocular depth estimation in a video stream is proposed in this paper. First, to enhance pothole feature extraction and improve the detection of small potholes, ACSH-YOLOv8 is proposed with ACmix module and the small object detection head. Then, the BoT-SORT algorithm is utilized for pothole tracking, while DepthAnything V2 generates depth maps for each frame. With the obtained depth maps and potholes labels, a novel Minimum Bounding Triangulated Pixel (MBTP) method is proposed for pothole area estimation. Finally, Kalman Filter based on Confidence and Distance (CDKF) is developed to maintain consistency of estimation results across consecutive frames. The results show that ACSH-YOLOv8 model achieves an AP(50) of 76.6%, representing a 7.6% improvement over YOLOv8. Through CDKF optimization across consecutive frames, pothole predictions become more robust, thereby enhancing the method's practical applicability.
