Table of Contents
Fetching ...

Contrastive Learning for Lane Detection via cross-similarity

Ali Zoljodi, Sadegh Abadijou, Mina Alibeigi, Masoud Daneshtalab

TL;DR

CLLD introduces a self-supervised lane-detection framework that uses cross-similarity between original and masked views to propagate surrounding context into occluded lane regions. By coupling local feature contrastive learning with a cross-view similarity operation, and optimizing a triple-loss objective (consistency, similarity, and instance), it strengthens long-range dependencies and occlusion resilience. Empirical results on CuLane and TuSimple show CLLD surpasses state-of-the-art contrastive methods and often outperforms supervised pretraining in challenging conditions such as shadows, while maintaining competitive performance in normal scenarios. The approach demonstrates practical impact for robust lane detection in real-world driving where visibility varies, and suggests avenues for extending to Vision Transformers and broader SSL applications in autonomous driving.

Abstract

Detecting lane markings in road scenes poses a challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by lighting conditions, occlusions by other vehicles or pedestrians, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of data to train a robust lane detection model capable of handling various scenarios. In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via cross-similarity (CLLD) to enhance the resilience of lane detection models in real-world scenarios, particularly when the visibility of lanes is compromised. CLLD introduces a contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our proposed cross-similar operation. The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmarks demonstrate that CLLD outperforms SOTA contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. Compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.

Contrastive Learning for Lane Detection via cross-similarity

TL;DR

CLLD introduces a self-supervised lane-detection framework that uses cross-similarity between original and masked views to propagate surrounding context into occluded lane regions. By coupling local feature contrastive learning with a cross-view similarity operation, and optimizing a triple-loss objective (consistency, similarity, and instance), it strengthens long-range dependencies and occlusion resilience. Empirical results on CuLane and TuSimple show CLLD surpasses state-of-the-art contrastive methods and often outperforms supervised pretraining in challenging conditions such as shadows, while maintaining competitive performance in normal scenarios. The approach demonstrates practical impact for robust lane detection in real-world driving where visibility varies, and suggests avenues for extending to Vision Transformers and broader SSL applications in autonomous driving.

Abstract

Detecting lane markings in road scenes poses a challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by lighting conditions, occlusions by other vehicles or pedestrians, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of data to train a robust lane detection model capable of handling various scenarios. In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via cross-similarity (CLLD) to enhance the resilience of lane detection models in real-world scenarios, particularly when the visibility of lanes is compromised. CLLD introduces a contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our proposed cross-similar operation. The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmarks demonstrate that CLLD outperforms SOTA contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. Compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.
Paper Structure (31 sections, 8 equations, 6 figures, 11 tables)

This paper contains 31 sections, 8 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Comparison of the state-of-the-art segmentation-based lane detection RESAZheng2021RESA with three different pretraining strategies. (a) Input image (b) RESA output with CLLD (ours) and (c) RESA output with supervised and (d) RESA output with PixProXie2021PropagateYourself pretraining. Yellow boxes represent accuracy drops in the detection of lanes that are occluded by cars.
  • Figure 2: The CLLD framework.
  • Figure 3: Masking input image; The given input with size $H \times W$ is divided into $\rho \times \rho$ patches. Each pixel of the masked patch got a random value from a zero-mean normal distribution $\mathcal{N}(0,1)$.
  • Figure 4: The cross-similarity operation
  • Figure 5: Qualitative comparison of the results of CLLD with prior SSL methods and supervised learning.
  • ...and 1 more figures