Table of Contents
Fetching ...

TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection

Martín Bayón-Gutiérrez, María Teresa García-Ordás, Héctor Alaiz Moretón, Jose Aveleira-Mata, Sergio Rubio Martín, José Alberto Benítez-Andrades

TL;DR

The paper tackles robust road surface estimation for autonomous driving by fusing RGB camera and LiDAR data in Bird's Eye View using a Twin Encoder-Decoder network (TEDNet). It introduces a base symmetric Encoder-Decoder with skip connections and conducts a thorough ablation across six variants to assess input strategies and encoder designs, finding that independent twin encoders yield the best performance. TEDNet achieves competitive results on the KITTI-Road benchmark and operates at real-time frame rates on standard hardware, demonstrating practical applicability for onboard perception. The study provides evidence that twin-encoder fusion in BEV can improve semantic road segmentation while maintaining efficiency, and it outlines future work to enhance robustness under diverse conditions.

Abstract

Robust road surface estimation is required for autonomous ground vehicles to navigate safely. Despite it becoming one of the main targets for autonomous mobility researchers in recent years, it is still an open problem in which cameras and LiDAR sensors have demonstrated to be adequate to predict the position, size and shape of the road a vehicle is driving on in different environments. In this work, a novel Convolutional Neural Network model is proposed for the accurate estimation of the roadway surface. Furthermore, an ablation study has been conducted to investigate how different encoding strategies affect model performance, testing 6 slightly different neural network architectures. Our model is based on the use of a Twin Encoder-Decoder Neural Network (TEDNet) for independent camera and LiDAR feature extraction, and has been trained and evaluated on the Kitti-Road dataset. Bird's Eye View projections of the camera and LiDAR data are used in this model to perform semantic segmentation on whether each pixel belongs to the road surface. The proposed method performs among other state-of-the-art methods and operates at the same frame-rate as the LiDAR and cameras, so it is adequate for its use in real-time applications.

TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection

TL;DR

The paper tackles robust road surface estimation for autonomous driving by fusing RGB camera and LiDAR data in Bird's Eye View using a Twin Encoder-Decoder network (TEDNet). It introduces a base symmetric Encoder-Decoder with skip connections and conducts a thorough ablation across six variants to assess input strategies and encoder designs, finding that independent twin encoders yield the best performance. TEDNet achieves competitive results on the KITTI-Road benchmark and operates at real-time frame rates on standard hardware, demonstrating practical applicability for onboard perception. The study provides evidence that twin-encoder fusion in BEV can improve semantic road segmentation while maintaining efficiency, and it outlines future work to enhance robustness under diverse conditions.

Abstract

Robust road surface estimation is required for autonomous ground vehicles to navigate safely. Despite it becoming one of the main targets for autonomous mobility researchers in recent years, it is still an open problem in which cameras and LiDAR sensors have demonstrated to be adequate to predict the position, size and shape of the road a vehicle is driving on in different environments. In this work, a novel Convolutional Neural Network model is proposed for the accurate estimation of the roadway surface. Furthermore, an ablation study has been conducted to investigate how different encoding strategies affect model performance, testing 6 slightly different neural network architectures. Our model is based on the use of a Twin Encoder-Decoder Neural Network (TEDNet) for independent camera and LiDAR feature extraction, and has been trained and evaluated on the Kitti-Road dataset. Bird's Eye View projections of the camera and LiDAR data are used in this model to perform semantic segmentation on whether each pixel belongs to the road surface. The proposed method performs among other state-of-the-art methods and operates at the same frame-rate as the LiDAR and cameras, so it is adequate for its use in real-time applications.
Paper Structure (13 sections, 4 figures, 3 tables)

This paper contains 13 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Preprocesed camera and LiDAR data
  • Figure 2: From top to bottom, CNN architecture for models A, C and E
  • Figure 3: Model prediction and true label on one scene of the validation split
  • Figure 4: Model C predictions from Kitti-Road evaluation server