Table of Contents
Fetching ...

Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering

Fouad Makiyeh, Mark Bastourous, Anass Bairouk, Wei Xiao, Mirjana Maras, Tsun-Hsuan Wangb, Marc Blanchon, Ramin Hasani, Patrick Chareyre, Daniela Rus

TL;DR

This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars.

Abstract

Autonomous vehicle navigation is a key challenge in artificial intelligence, requiring robust and accurate decision-making processes. This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars. Unlike conventional models that require several sensors which can be costly and complex or rely exclusively on RGB images that may not be robust enough under different conditions, our model significantly improves vehicle steering prediction performance from a single visual sensor. By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a comprehensive framework that integrates these modalities through both early and hybrid fusion techniques. We use three distinct neural network models to implement our approach: Convolution Neural Network - Neutral Circuit Policy (CNN-NCP) , Variational Auto Encoder - Long Short-Term Memory (VAE-LSTM) , and Neural Circuit Policy architecture VAE-NCP. By incorporating optical flow into the decision-making process, our method significantly advances autonomous navigation. Empirical results from our comparative study using Boston driving data show that our model, which integrates image and motion information, is robust and reliable. It outperforms state-of-the-art approaches that do not use optical flow, reducing the steering estimation error by 31%. This demonstrates the potential of optical flow data, combined with advanced neural network architectures (a CNN-based structure for fusing data and a Recurrence-based network for inferring a command from latent space), to enhance the performance of autonomous vehicles steering estimation.

Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering

TL;DR

This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars.

Abstract

Autonomous vehicle navigation is a key challenge in artificial intelligence, requiring robust and accurate decision-making processes. This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars. Unlike conventional models that require several sensors which can be costly and complex or rely exclusively on RGB images that may not be robust enough under different conditions, our model significantly improves vehicle steering prediction performance from a single visual sensor. By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a comprehensive framework that integrates these modalities through both early and hybrid fusion techniques. We use three distinct neural network models to implement our approach: Convolution Neural Network - Neutral Circuit Policy (CNN-NCP) , Variational Auto Encoder - Long Short-Term Memory (VAE-LSTM) , and Neural Circuit Policy architecture VAE-NCP. By incorporating optical flow into the decision-making process, our method significantly advances autonomous navigation. Empirical results from our comparative study using Boston driving data show that our model, which integrates image and motion information, is robust and reliable. It outperforms state-of-the-art approaches that do not use optical flow, reducing the steering estimation error by 31%. This demonstrates the potential of optical flow data, combined with advanced neural network architectures (a CNN-based structure for fusing data and a Recurrence-based network for inferring a command from latent space), to enhance the performance of autonomous vehicles steering estimation.
Paper Structure (13 sections, 7 equations, 4 figures, 2 tables)

This paper contains 13 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Block diagram of the overall approach. An example of different modalities extracted from an RGB image, such as the depth map and optical flow. Two of these modalities are fused in a convolutional feature extractor followed by a recurrent neural network for vehicle steering estimation.
  • Figure 2: Variation of MSE for training and validation across 100 steps when using either RGB and RGB-OF for VAE-NCP (black and green lines) and CNN-NCP models (red and blue lines), corresponding to F.1 in Table \ref{['tab:comparison']} (I & II).
  • Figure 3: Steering Error Response to Latent Dimension Perturbations: the left plot depicts the MSE variability under perturbation of each latent dimension in a model using RGB and optical flow, while the right plot displays the variability in MSE when each latent dimension is perturbed in a model using only RGB images. The perturbation value was set to $\sigma=0.3$.
  • Figure 4: Impact Score Distribution for the Top 10% of Predictions with Highest Error: The left plot corresponds to the RGB-OF model, illustrating a consistent impact score across perturbed dimensions, while the right plot corresponds to the RGB-only model, displaying a more varied impact score.