OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection

Dongkwon Jin; Chang-Su Kim

OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection

Dongkwon Jin, Chang-Su Kim

TL;DR

This work tackles the problem of video lane detection under occlusion by introducing occlusion‑aware memory‑based refinement (OMR). The method detects latent obstacles, uses memory across frames, and refines current‑frame features to reliably recover occluded lanes, implemented through four stages: encoding, latent obstacle detection, OMR, and decoding. A two‑step training procedure and a novel synthetic data augmentation strategy enhance robustness to occlusion and temporal variations. Empirical results on VIL‑100 and OpenLane‑V show improved temporal stability and competitive accuracy, with real‑time speed (~105 FPS), demonstrating practical impact for robust video lane detection in challenging scenarios.

Abstract

A novel algorithm for video lane detection is proposed in this paper. First, we extract a feature map for a current frame and detect a latent mask for obstacles occluding lanes. Then, we enhance the feature map by developing an occlusion-aware memory-based refinement (OMR) module. It takes the obstacle mask and feature map from the current frame, previous output, and memory information as input, and processes them recursively in a video. Moreover, we apply a novel data augmentation scheme for training the OMR module effectively. Experimental results show that the proposed algorithm outperforms existing techniques on video lane datasets. Our codes are available at https://github.com/dongkwonjin/OMR.

OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 10 figures, 4 tables)

This paper contains 17 sections, 10 equations, 10 figures, 4 tables.

Introduction
Related Work
Image-Based Lane Detection
Video-Based Lane Detection
Proposed Algorithm
Encoding
Decoding
Latent Obstacle Detection
Occlusion-Aware Memory-Based Refinement
Training
Experimental Results
Implementation Details
Datasets
Evaluation Metrics
Comparative Assessment
...and 2 more sections

Figures (10)

Figure 1: Examples of road scenes, in which some lane parts are occluded by several objects. Visible lanes and obstructing objects are depicted by white lines and orange polygons, respectively.
Figure 2: There are three approaches to video lane detection. In (a), the feature maps of a current frame $I_t$ and the past $T$ frames are extracted and mixed to refine the feature map of $I_t$. In (b), only a single previous frame is used to enhance the feature map of $I_t$, and the enhanced one is passed recursively to the subsequent frame. The proposed algorithm in (c) utilizes obstacle and memory information to improve the feature map of $I_t$ via the OMR module. Note that gray, blue, green, and orange boxes represent intra-frame features, refined features, recorded memory, and a latent obstacle mask, respectively.
Figure 3: Overview of the proposed algorithm, which performs four steps: encoding, latent obstacle detection, OMR, and decoding. In this example, the rightmost lane is partially occluded by nearby vehicles, so the encoded features are defective, making lane detection difficult. The proposed algorithm, however, can detect the implicit lane precisely by refining the features within the occluded regions effectively. As depicted by dotted red boxes, we see that the proposed OMR module enhances the features of the occluded lane into more discriminative ones.
Figure 4: Architecture of the encoder and the decoder: (a) Given an image $I$, three coarsest feature maps are extracted using a backbone network. After matching their channel dimensions and resolutions, they are encoded into a combined feature map $F$. (b) From a feature map $F$, a lane probability map $P$ is estimated. Then, by applying a deformable convolution, a lane coefficient map $C$ is predicted from $P$.
Figure 5: Block diagrams of the latent obstacle detection and OMR: (a) From the encoded feature map $F$, a binary probability map $S$ for latent obstacles is predicted. By thresholding $S$, a binary obstacle mask $O$ is determined. To obtain its ground-truth $\bar{S}$, SegFormer xie2021segformer, which is a semantic segmentation algorithm, is employed. (b) In OMR, four input maps $L_{t-1}$, $F_{t-1}$, $\tilde{O}_t$, and $\tilde{F}_t$ are aggregated to $Z$. Then, using the combined feature map $Z$, ConvLSTM shi2015 is used to update $(h_{t-1}, c_{t-1})$ to $(h_t, c_t)$ via (\ref{['eq:convlstm']}). Then, $h_t$ is added to $\tilde{F}_t$ to refine it into $F_t$. Blue boxes represent a series of 2D convolution operations with batch-normalization and ReLU function.
...and 5 more figures

OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection

TL;DR

Abstract

OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)