Contrastive Learning for Lane Detection via cross-similarity
Ali Zoljodi, Sadegh Abadijou, Mina Alibeigi, Masoud Daneshtalab
TL;DR
CLLD introduces a self-supervised lane-detection framework that uses cross-similarity between original and masked views to propagate surrounding context into occluded lane regions. By coupling local feature contrastive learning with a cross-view similarity operation, and optimizing a triple-loss objective (consistency, similarity, and instance), it strengthens long-range dependencies and occlusion resilience. Empirical results on CuLane and TuSimple show CLLD surpasses state-of-the-art contrastive methods and often outperforms supervised pretraining in challenging conditions such as shadows, while maintaining competitive performance in normal scenarios. The approach demonstrates practical impact for robust lane detection in real-world driving where visibility varies, and suggests avenues for extending to Vision Transformers and broader SSL applications in autonomous driving.
Abstract
Detecting lane markings in road scenes poses a challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by lighting conditions, occlusions by other vehicles or pedestrians, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of data to train a robust lane detection model capable of handling various scenarios. In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via cross-similarity (CLLD) to enhance the resilience of lane detection models in real-world scenarios, particularly when the visibility of lanes is compromised. CLLD introduces a contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our proposed cross-similar operation. The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmarks demonstrate that CLLD outperforms SOTA contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. Compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.
