Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception

Yaxin Feng; Yuan Lan; Luchan Zhang; Guoqing Liu; Yang Xiang

Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception

Yaxin Feng, Yuan Lan, Luchan Zhang, Guoqing Liu, Yang Xiang

TL;DR

The paper addresses real-time, multi-class traffic scene perception by introducing EIEGSeg, a plug-in training strategy that couples a topology-aware elastic interaction energy loss (EIEL) with standard cross-entropy. EIEL guides predictions toward ground-truth topology by modeling the segmentation task as a curve-based energy minimization, enabling the recovery of slender, occluded, and irregular structures without extra inference cost. The approach is efficiently implemented using FFTs and is compatible with lightweight backbones, demonstrated on Cityscapes, TuSimple, and CULane with notable gains in mIoU and lane connectivity. The results indicate significant practical impact for autonomous driving, improving both segmentation accuracy of fine-scale objects and the robustness of segmentation-based lane detection in real-time systems.

Abstract

Urban segmentation and lane detection are two important tasks for traffic scene perception. Accuracy and fast inference speed of visual perception are crucial for autonomous driving safety. Fine and complex geometric objects are the most challenging but important recognition targets in traffic scene, such as pedestrians, traffic signs and lanes. In this paper, a simple and efficient topology-aware energy loss function-based network training strategy named EIEGSeg is proposed. EIEGSeg is designed for multi-class segmentation on real-time traffic scene perception. To be specific, the convolutional neural network (CNN) extracts image features and produces multiple outputs, and the elastic interaction energy loss function (EIEL) drives the predictions moving toward the ground truth until they are completely overlapped. Our strategy performs well especially on fine-scale structure, \textit{i.e.} small or irregularly shaped objects can be identified more accurately, and discontinuity issues on slender objects can be improved. We quantitatively and qualitatively analyze our method on three traffic datasets, including urban scene segmentation data Cityscapes and lane detection data TuSimple and CULane. Our results demonstrate that EIEGSeg consistently improves the performance, especially on real-time, lightweight networks that are better suited for autonomous driving.

Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception

TL;DR

Abstract

Paper Structure (22 sections, 8 equations, 7 figures, 3 tables)

This paper contains 22 sections, 8 equations, 7 figures, 3 tables.

INTRODUCTION
RELATED WORK
Semantic Segmentation Network Structures
Topology-aware segmentation
METHODOLOGY
Network Structure
Elastic Interaction Energy
Preliminary
EIEL for Image Segmentation
EIEL-Guided Multi Class Segmentation
Efficient Calculation
Total loss function in EIEGSeg
EXPERIMENTS
Implimentation Details
Datasets and Evaluation Metrics
...and 7 more sections

Figures (7)

Figure 1: An example of training process. The pictures from left to right are the predictions of the models without/with EIEL during training. EIEL corrected the wrong classification pixels, and the end of the lane is clearer.
Figure 2: Comparison of the output maps between Baseline ERFNet romera2017erfnet without (w/o) and with (w) EIEL in (a) lane detection and (b) segmentation. To be specific, the pictures from left to right are: raw image, ground truth, outputs w/o EIEL and outputs w EIEL.
Figure 3: Network Structure of EIEGSeg, which jointly guided by EIEL and CE in the training process. Note that this is just a schematic illustration, and the predicted probability maps of some complicated cases in the figure do not exactly correspond to reality.
Figure 4: Description of how the EIEL works. (a) shows how the moving curve being attracted to the ground truth. (b) shows how the disconnected prediction of the left most lane is corrected by EIEL even it is occluded by a car.
Figure 5: Comparison of the performance on Semantic Segmentation task. The pictures above from left to right are ground truth, label, baseline OCR without (w/o) and with (w) EIEL. Orange boxes indicate the better performance on pole, traffic sign and light, and the green ones are those with clearer pedestrians segmentation. Some obvious incorrect segmentation positions (w/o EIEL) in the third column are highlighted in red boxes.
...and 2 more figures

Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception

TL;DR

Abstract

Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception

Authors

TL;DR

Abstract

Table of Contents

Figures (7)