Homography Guided Temporal Fusion for Road Line and Marking Segmentation
Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li
TL;DR
This work tackles occlusion and lighting challenges in road line and marking segmentation for autonomous driving by combining geometric and temporal cues. It introduces HomoFusion, a homography-guided cross-frame attention module, and RSNE, a differentiable road surface normal estimator, to fuse adjacent frames and recover partially occluded markings. The approach yields state-of-the-art performance on ApolloScape and ApolloScape Night with far fewer parameters and GFLOPs, and demonstrates applicability to water puddle segmentation, highlighting its efficiency and versatility for real-time driving systems. By exploiting a ground-plane assumption and camera intrinsics, the method achieves robust cross-frame alignment and improved segmentation accuracy in challenging conditions, advancing practical deployment in edge devices.
Abstract
Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. To reduce computational complexity, a novel surface normal estimator is proposed to establish spatial correspondences between the sampled frames, allowing the HomoFusion module to perform a pixel-to-pixel attention mechanism in updating the representation of the occluded road lines or markings. Experiments on ApolloScape, a large-scale lane mark segmentation dataset, and ApolloScape Night with artificial simulated night-time road conditions, demonstrate that our method outperforms other existing SOTA lane mark segmentation models with less than 9\% of their parameters and computational complexity. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy. We also prove the versatility of our HomoFusion approach by applying it to the problem of water puddle segmentation and achieving SOTA performance.
