Table of Contents
Fetching ...

End-to-end Autonomous Vehicle Following System using Monocular Fisheye Camera

Jiale Zhang, Yeqiang Qian, Tong Qin, Mingyang Jiang, Siyuan Chen, Ming Yang

TL;DR

This paper tackles the practical challenge of autonomous vehicle following in general road scenarios using only a monocular fisheye camera. It introduces an end-to-end framework that combines BEV perception, a semantic mask to mitigate causal confusion, a dynamic sampling mechanism for rich historical context, and GRU-based temporal fusion to generate a feasible ego trajectory for the next 3 seconds. The key contributions include the semantic masking approach to block irrelevant gradient flow, dynamic spatial sampling to capture the preceding vehicle’s history, and real-world closed-loop validation showing superior longitudinal and lateral following accuracy versus traditional multi-stage methods. The work demonstrates a cost-effective path toward general-scenario autonomous vehicle platooning with robust performance across diverse driving conditions and could enable scalable deployment with reduced sensor requirements.

Abstract

The increase in vehicle ownership has led to increased traffic congestion, more accidents, and higher carbon emissions. Vehicle platooning is a promising solution to address these issues by improving road capacity and reducing fuel consumption. However, existing platooning systems face challenges such as reliance on lane markings and expensive high-precision sensors, which limits their general applicability. To address these issues, we propose a vehicle following framework that expands its capability from restricted scenarios to general scenario applications using only a camera. This is achieved through our newly proposed end-to-end method, which improves overall driving performance. The method incorporates a semantic mask to address causal confusion in multi-frame data fusion. Additionally, we introduce a dynamic sampling mechanism to precisely track the trajectories of preceding vehicles. Extensive closed-loop validation in real-world vehicle experiments demonstrates the system's ability to follow vehicles in various scenarios, outperforming traditional multi-stage algorithms. This makes it a promising solution for cost-effective autonomous vehicle platooning. A complete real-world vehicle experiment is available at https://youtu.be/zL1bcVb9kqQ.

End-to-end Autonomous Vehicle Following System using Monocular Fisheye Camera

TL;DR

This paper tackles the practical challenge of autonomous vehicle following in general road scenarios using only a monocular fisheye camera. It introduces an end-to-end framework that combines BEV perception, a semantic mask to mitigate causal confusion, a dynamic sampling mechanism for rich historical context, and GRU-based temporal fusion to generate a feasible ego trajectory for the next 3 seconds. The key contributions include the semantic masking approach to block irrelevant gradient flow, dynamic spatial sampling to capture the preceding vehicle’s history, and real-world closed-loop validation showing superior longitudinal and lateral following accuracy versus traditional multi-stage methods. The work demonstrates a cost-effective path toward general-scenario autonomous vehicle platooning with robust performance across diverse driving conditions and could enable scalable deployment with reduced sensor requirements.

Abstract

The increase in vehicle ownership has led to increased traffic congestion, more accidents, and higher carbon emissions. Vehicle platooning is a promising solution to address these issues by improving road capacity and reducing fuel consumption. However, existing platooning systems face challenges such as reliance on lane markings and expensive high-precision sensors, which limits their general applicability. To address these issues, we propose a vehicle following framework that expands its capability from restricted scenarios to general scenario applications using only a camera. This is achieved through our newly proposed end-to-end method, which improves overall driving performance. The method incorporates a semantic mask to address causal confusion in multi-frame data fusion. Additionally, we introduce a dynamic sampling mechanism to precisely track the trajectories of preceding vehicles. Extensive closed-loop validation in real-world vehicle experiments demonstrates the system's ability to follow vehicles in various scenarios, outperforming traditional multi-stage algorithms. This makes it a promising solution for cost-effective autonomous vehicle platooning. A complete real-world vehicle experiment is available at https://youtu.be/zL1bcVb9kqQ.

Paper Structure

This paper contains 23 sections, 14 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Causal confusion in BEV temporal fusion, where irrelevant features in BEV space shift corresponding to the ego vehicle's past motion.
  • Figure 2: The overall framework of the system. The network processes multi-frame fisheye images to output the planned trajectory of the ego vehicle.
  • Figure 3: BEV feature extraction with semantic mask. Only relevant features are projected into the BEV space.
  • Figure 4: Real-world experimental platform.
  • Figure 5: Realistic route for closed-loop experiment: A 1.5 km route containing diverse traffic scenarios (straight roads, curves, intersections, and roundabouts), where the preceding vehicle maintained a velocity below 6 m/s, an acceleration within 1.5 m/s², a fixed following distance of 4 meters, and a time gap of 0.5 seconds.
  • ...and 4 more figures