Dense Road Surface Grip Map Prediction from Multimodal Image Data
Jyri Maanpää, Julius Pesonen, Heikki Hyyti, Iaroslav Melekhov, Juho Kannala, Petri Manninen, Antero Kukko, Juha Hyyppä
TL;DR
This work tackles predicting a dense, pixelwise road-surface grip map to anticipate slippery conditions for autonomous driving. It leverages a CNN with modality-specific encoders and multi-encoder fusion to fuse forward RGB, thermal, and LiDAR reflectance data, trained with weak, pixelwise ground truth from an optical road weather sensor. The authors assemble a 37-hour ARVO dataset across diverse winter conditions and introduce a rigorous pixelwise data matching pipeline, reporting that sensor fusion yields better grip predictions than single-modality baselines (best RMSE: $0.0632$ on validation and $0.0575$ on test for RGB+T+R). The study contributes a data collection/processing pipeline, an effective dense grip-map predictor, and a public demo, underscoring the practical potential of multimodal sensing for proactive autonomous driving under adverse weather.
Abstract
Slippery road weather conditions are prevalent in many regions and cause a regular risk for traffic. Still, there has been less research on how autonomous vehicles could detect slippery driving conditions on the road to drive safely. In this work, we propose a method to predict a dense grip map from the area in front of the car, based on postprocessed multimodal sensor data. We trained a convolutional neural network to predict pixelwise grip values from fused RGB camera, thermal camera, and LiDAR reflectance images, based on weakly supervised ground truth from an optical road weather sensor. The experiments show that it is possible to predict dense grip values with good accuracy from the used data modalities as the produced grip map follows both ground truth measurements and local weather conditions, such as snowy areas on the road. The model using only the RGB camera or LiDAR reflectance modality provided good baseline results for grip prediction accuracy while using models fusing the RGB camera, thermal camera, and LiDAR modalities improved the grip predictions significantly.
