Dense Road Surface Grip Map Prediction from Multimodal Image Data

Jyri Maanpää; Julius Pesonen; Heikki Hyyti; Iaroslav Melekhov; Juho Kannala; Petri Manninen; Antero Kukko; Juha Hyyppä

Dense Road Surface Grip Map Prediction from Multimodal Image Data

Jyri Maanpää, Julius Pesonen, Heikki Hyyti, Iaroslav Melekhov, Juho Kannala, Petri Manninen, Antero Kukko, Juha Hyyppä

TL;DR

This work tackles predicting a dense, pixelwise road-surface grip map to anticipate slippery conditions for autonomous driving. It leverages a CNN with modality-specific encoders and multi-encoder fusion to fuse forward RGB, thermal, and LiDAR reflectance data, trained with weak, pixelwise ground truth from an optical road weather sensor. The authors assemble a 37-hour ARVO dataset across diverse winter conditions and introduce a rigorous pixelwise data matching pipeline, reporting that sensor fusion yields better grip predictions than single-modality baselines (best RMSE: $0.0632$ on validation and $0.0575$ on test for RGB+T+R). The study contributes a data collection/processing pipeline, an effective dense grip-map predictor, and a public demo, underscoring the practical potential of multimodal sensing for proactive autonomous driving under adverse weather.

Abstract

Slippery road weather conditions are prevalent in many regions and cause a regular risk for traffic. Still, there has been less research on how autonomous vehicles could detect slippery driving conditions on the road to drive safely. In this work, we propose a method to predict a dense grip map from the area in front of the car, based on postprocessed multimodal sensor data. We trained a convolutional neural network to predict pixelwise grip values from fused RGB camera, thermal camera, and LiDAR reflectance images, based on weakly supervised ground truth from an optical road weather sensor. The experiments show that it is possible to predict dense grip values with good accuracy from the used data modalities as the produced grip map follows both ground truth measurements and local weather conditions, such as snowy areas on the road. The model using only the RGB camera or LiDAR reflectance modality provided good baseline results for grip prediction accuracy while using models fusing the RGB camera, thermal camera, and LiDAR modalities improved the grip predictions significantly.

Dense Road Surface Grip Map Prediction from Multimodal Image Data

TL;DR

on validation and

on test for RGB+T+R). The study contributes a data collection/processing pipeline, an effective dense grip-map predictor, and a public demo, underscoring the practical potential of multimodal sensing for proactive autonomous driving under adverse weather.

Abstract

Paper Structure (15 sections, 1 equation, 8 figures, 1 table)

This paper contains 15 sections, 1 equation, 8 figures, 1 table.

Introduction
Background
Data
Dataset Collection
Datasplit
Pixelwise Matching of Modalities
Methods
Model
Training Setup
Performance Evaluation
Results
Validation and Test Set Errors
Qualitative Performance
Discussion
Conclusions

Figures (8)

Figure 1: Our work presents a grip prediction model, which operates on pixelwise fused RGB camera, thermal camera, and LiDAR reflectance measurements and predicts a dense grip map of the road area. The ground truth for training is obtained with an optical road weather sensor that provides road grip measurements which are postprocessed with GNSS trajectories and external calibrations to match the input data.
Figure 2: The research vehicle ARVO used for data collection. The long-range sensors shown in box A are 1. LiDAR, 2. RGB camera in a weatherproof housing and 3. thermal cameras. The road weather sensor is shown in box B.
Figure 3: The distribution of grip values and road surface states provided by the road weather sensor measurements in the complete unprocessed dataset. We observe that most of the data is collected within dry, wet, or snowy conditions.
Figure 4: The model architecture and training scheme for the model using all data modalities. Each input data modality has a separate encoder and their features are concatenated within each feature scale before the FPN decoder. The loss is evaluated both for the grip and the surface layer thickness prediction tasks.
Figure 5: Scatter plots of predicted grips and layer thicknesses produced by the best, proposed model (RGB+T+R). The x-axis represents the ground truth values and the y-axis the predictions. The plots were generated using 50 000 random measurements and corresponding predictions from the test set. The red dashed line represents the position of correct predictions.
...and 3 more figures

Dense Road Surface Grip Map Prediction from Multimodal Image Data

TL;DR

Abstract

Dense Road Surface Grip Map Prediction from Multimodal Image Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)