Table of Contents
Fetching ...

CLRKDNet: Speeding up Lane Detection with Knowledge Distillation

Weiqing Qi, Guoyang Zhao, Fulong Ma, Linwei Zheng, Ming Liu

TL;DR

CLRKDNet tackles the real-time lane-detection problem by distilling knowledge from the heavy CLRNet teacher into a streamlined student. It achieves a lean FPN and a single detection head, augmented by Activation Attention Transfer, Prior Embedding Distillation, and Logit Distillation to recover teacher-level accuracy. Across CULane and TuSimple, CLRKDNet delivers up to $60\%$ faster inference with only minor losses in F1-score, outperforming prior distillation baselines and enabling practical deployment in real-time autonomous driving. The work highlights how targeted multi-source knowledge transfer can close performance gaps while meeting stringent latency constraints in perception systems.

Abstract

Road lanes are integral components of the visual perception systems in intelligent vehicles, playing a pivotal role in safe navigation. In lane detection tasks, balancing accuracy with real-time performance is essential, yet existing methods often sacrifice one for the other. To address this trade-off, we introduce CLRKDNet, a streamlined model that balances detection accuracy with real-time performance. The state-of-the-art model CLRNet has demonstrated exceptional performance across various datasets, yet its computational overhead is substantial due to its Feature Pyramid Network (FPN) and muti-layer detection head architecture. Our method simplifies both the FPN structure and detection heads, redesigning them to incorporate a novel teacher-student distillation process alongside a newly introduced series of distillation losses. This combination reduces inference time by up to 60% while maintaining detection accuracy comparable to CLRNet. This strategic balance of accuracy and speed makes CLRKDNet a viable solution for real-time lane detection tasks in autonomous driving applications.

CLRKDNet: Speeding up Lane Detection with Knowledge Distillation

TL;DR

CLRKDNet tackles the real-time lane-detection problem by distilling knowledge from the heavy CLRNet teacher into a streamlined student. It achieves a lean FPN and a single detection head, augmented by Activation Attention Transfer, Prior Embedding Distillation, and Logit Distillation to recover teacher-level accuracy. Across CULane and TuSimple, CLRKDNet delivers up to faster inference with only minor losses in F1-score, outperforming prior distillation baselines and enabling practical deployment in real-time autonomous driving. The work highlights how targeted multi-source knowledge transfer can close performance gaps while meeting stringent latency constraints in perception systems.

Abstract

Road lanes are integral components of the visual perception systems in intelligent vehicles, playing a pivotal role in safe navigation. In lane detection tasks, balancing accuracy with real-time performance is essential, yet existing methods often sacrifice one for the other. To address this trade-off, we introduce CLRKDNet, a streamlined model that balances detection accuracy with real-time performance. The state-of-the-art model CLRNet has demonstrated exceptional performance across various datasets, yet its computational overhead is substantial due to its Feature Pyramid Network (FPN) and muti-layer detection head architecture. Our method simplifies both the FPN structure and detection heads, redesigning them to incorporate a novel teacher-student distillation process alongside a newly introduced series of distillation losses. This combination reduces inference time by up to 60% while maintaining detection accuracy comparable to CLRNet. This strategic balance of accuracy and speed makes CLRKDNet a viable solution for real-time lane detection tasks in autonomous driving applications.
Paper Structure (34 sections, 8 equations, 4 figures, 4 tables)

This paper contains 34 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: FPS vs. F1-score of state-of-the-art methods on CULane benchmarks.
  • Figure 2: The upper part of the model represents the teacher's configuration, featuring a deeper backbone for feature extraction, three layers of FPN for feature fusion, and a detection head connected to each FPN layer. The detection head performs iterative prior refinement, indicated by the dashed line looping back to the prior parameters. The lower part is the student network, CLRKDNet, which typically has a lighter backbone, a feature aggregation module for feature enhancement, and a single detection head for lane prediction output. During the training process, three types of distillation are applied: (a) Attention Map Transfer, occurring during multi-scale feature extraction in the backbone network, transferring attention maps information from the teacher to the student model; (b) Prior Knowledge Transfer, transferring the teacher's refined priors to the student's initial priors; and (c) Logits Transfer, comparing the classification and regression outputs of both models to refine the student's performance.
  • Figure 3: Illustration of detection head
  • Figure 4: Selected results comparing the CLRNet teacher model with CLRKDNet student model against ground truth annotation and input images. Certain category where the student surpasses the teacher's model is shown with dotted circle representing the missing detected lane.