Table of Contents
Fetching ...

Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames

Hu Cao, Jiong Liu, Xingzhuo Yan, Rui Song, Yan Xia, Walter Zimmer, Guang Chen, Alois Knoll

Abstract

In autonomous driving, relying solely on frame-based cameras can lead to inaccuracies caused by factors like long exposure times, high-speed motion, and challenging lighting conditions. To address these issues, we introduce a bio-inspired vision sensor known as the event camera. Unlike conventional cameras, event cameras capture sparse, asynchronous events that provide a complementary modality to mitigate these challenges. In this work, we propose an energy-aware imitation learning framework for steering prediction that leverages both events and frames. Specifically, we design an Energy-driven Cross-modality Fusion Module (ECFM) and an energy-aware decoder to produce reliable and safe predictions. Extensive experiments on two public real-world datasets, DDD20 and DRFuser, demonstrate that our method outperforms existing state-of-the-art (SOTA) approaches. The codes and trained models will be released upon acceptance.

Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames

Abstract

In autonomous driving, relying solely on frame-based cameras can lead to inaccuracies caused by factors like long exposure times, high-speed motion, and challenging lighting conditions. To address these issues, we introduce a bio-inspired vision sensor known as the event camera. Unlike conventional cameras, event cameras capture sparse, asynchronous events that provide a complementary modality to mitigate these challenges. In this work, we propose an energy-aware imitation learning framework for steering prediction that leverages both events and frames. Specifically, we design an Energy-driven Cross-modality Fusion Module (ECFM) and an energy-aware decoder to produce reliable and safe predictions. Extensive experiments on two public real-world datasets, DDD20 and DRFuser, demonstrate that our method outperforms existing state-of-the-art (SOTA) approaches. The codes and trained models will be released upon acceptance.

Paper Structure

This paper contains 20 sections, 17 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Performance comparison on the DDD20 and DRFuser datasets. Our proposed method achieves SOTA performance, outperforming previous approaches EyEF zhou2024steering, CAFR cao2024embracing, DRFuser munir2023multimodal, and EFNet EFNet in terms of RMSE and MAE metrics.
  • Figure 2: The proposed model architecture consists of three main components: a dual-stream backbone network, ECFM modules, and an energy-aware decoder. The backbone network comprises two branches: the event-based ResNet at the bottom and the frame-based ResNet resnet at the top. Each ECFM module operates to enhance features across different hierarchical scales. Subsequently, the energy-aware decoder is used to perform steering angle predictions.
  • Figure 3: The architecture of the proposed ECFM. 3D weights are generated based on energy function. In ECFM, fused features $F_{f}^{fused}$ and $F_{e}^{fused}$ from the frame and event branches are concatenated and then passed through a $\mathrm{Conv_{1\times1}}$ layer to produce the final output, $F_o$. Notably, this $\mathrm{Conv_{1\times1}}$ layer is the only learnable component within ECFM.
  • Figure 4: A quantitative comparison of our proposed method with EyEF zhou2024steering and DRFuser munir2023multimodal on the DDD20 and DRFuser datasets.
  • Figure 5: The visualization of steering angle prediction in comparison to ground truth on DDD20 (top) and DRFuser (bottom) datasets. Each pair of samples consists of frame data on the left and event data on the right.
  • ...and 5 more figures