NeRO: Neural Road Surface Reconstruction

Ruibo Wang; Song Zhang; Ping Huang; Donghai Zhang; Haoyu Chen

NeRO: Neural Road Surface Reconstruction

Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Haoyu Chen

TL;DR

This work tackles road surface reconstruction for applications like autonomous driving by introducing NeRO, a neural framework that maps world coordinates $ (x,y) $ to road height $ z $, color $ c $, and semantic $ s $ using position-encoding MLPs. It demonstrates compatibility with multiple height sources (vehicle pose, LiDAR, SfM) and robustness to semantic noise, while achieving fast training suitable for road visualization and 4D labeling. The paper systematically compares standard positional encoding and Multi-Resolution Hash Positional Encoding, showing how hash-based encoding improves detail and speed, especially under sparse or noisy semantic supervision. By integrating height, color, and semantics in a unified, mesh-free representation, NeRO enables efficient rendering of semantically rich road surfaces and supports future 4D labeling and semantic grouping tasks in real-world scenes.

Abstract

Accurately reconstructing road surfaces is pivotal for various applications especially in autonomous driving. This paper introduces a position encoding Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces, with input as world coordinates x and y, and output as height, color, and semantic information. The effectiveness of this method is demonstrated through its compatibility with a variety of road height sources like vehicle camera poses, LiDAR point clouds, and SFM point clouds, robust to the semantic noise of images like sparse labels and noise semantic prediction, and fast training speed, which indicates a promising application for rendering road surfaces with semantics, particularly in applications demanding visualization of road surface, 4D labeling, and semantic groupings.

NeRO: Neural Road Surface Reconstruction

TL;DR

This work tackles road surface reconstruction for applications like autonomous driving by introducing NeRO, a neural framework that maps world coordinates

to road height

, color

, and semantic

using position-encoding MLPs. It demonstrates compatibility with multiple height sources (vehicle pose, LiDAR, SfM) and robustness to semantic noise, while achieving fast training suitable for road visualization and 4D labeling. The paper systematically compares standard positional encoding and Multi-Resolution Hash Positional Encoding, showing how hash-based encoding improves detail and speed, especially under sparse or noisy semantic supervision. By integrating height, color, and semantics in a unified, mesh-free representation, NeRO enables efficient rendering of semantically rich road surfaces and supports future 4D labeling and semantic grouping tasks in real-world scenes.

Abstract

Paper Structure (29 sections, 2 equations, 7 figures, 3 tables)

This paper contains 29 sections, 2 equations, 7 figures, 3 tables.

Introduction
Related Works
Multi-view 2D Image to 3D Reconstruction
NeRF-based Reconstruction
NeRF with Semantic
NeRF-based Road surface Reconstruction
Method
NeRO Network Structure
Encoding methods
Positional Encoding
Multi-Resolution Hash Positional Encoding
Reconstruction methods
Z-axis Reconstruction
Color Reconstruction
Semantic Reconstruction
...and 14 more sections

Figures (7)

Figure 1: The result of our method in reconstructing an entire segment of the road from the KITTI Odometry Sequence 00.
Figure 2: NeRO Overview. The world coordinates X = (x, y) are encoded by the Positional Encoding or Multiresolution Hash Positional Encoding. Then, the encoded information is passed into three different MLPs responsible for calculating the height z, color, and semantic.
Figure 3: Comparison in reconstructing the incomplete road. The red circle indicates the hole in the horizontal and corner of the road surface.
Figure 4: In qualitative comparison with the PE and Hash PE in the datasets of the vehicle camera pose, LiDAR points, SfMs dense points, and SfMs sparse points. The first column shows each input dataset, and the rest illustrates the reconstruction result. Hash PE has a better render quality in each result than in color and semantics.
Figure 5: Comparison with the positional encoding and multi-resolution hash positional encoding in the sparse semantic labels. Both methods render the whole structure of the road, but the Hash PE on the left gives more detail in the result.
...and 2 more figures

NeRO: Neural Road Surface Reconstruction

TL;DR

Abstract

NeRO: Neural Road Surface Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (7)