Table of Contents
Fetching ...

NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing

Shutong Zhang

TL;DR

NPSim addresses the lack of realistic nighttime semantic segmentation data by proposing a physics-based pipeline that converts daytime images to nighttime using monocular inverse rendering and ray tracing. The core contribution is a Geometry Mesh Reconstruction component that leverages depth/normal estimation and Worldsheet to build accurate scene meshes, augmented by depth refinement and mesh post-processing to eliminate artifacts. Although relighting is described in detail, the thesis primarily implements the mesh reconstruction and provides a concrete plan for material prediction, probabilistic light activation, and ray-traced nighttime rendering, aiming to produce diverse nighttime datasets for training and evaluation. The approach promises improved robustness of vision systems in low-light conditions and generalizes across multiple driving datasets, while acknowledging limitations such as manual light-source masks and depth-normal domain shifts. Overall, NPSim offers a principled, 3D-consistent path to generate high-fidelity nighttime imagery that can steer future advances in nighttime semantic scene understanding.

Abstract

Semantic segmentation is an important task for autonomous driving. A powerful autonomous driving system should be capable of handling images under all conditions, including nighttime. Generating accurate and diverse nighttime semantic segmentation datasets is crucial for enhancing the performance of computer vision algorithms in low-light conditions. In this thesis, we introduce a novel approach named NPSim, which enables the simulation of realistic nighttime images from real daytime counterparts with monocular inverse rendering and ray tracing. NPSim comprises two key components: mesh reconstruction and relighting. The mesh reconstruction component generates an accurate representation of the scene structure by combining geometric information extracted from the input RGB image and semantic information from its corresponding semantic labels. The relighting component integrates real-world nighttime light sources and material characteristics to simulate the complex interplay of light and object surfaces under low-light conditions. The scope of this thesis mainly focuses on the implementation and evaluation of the mesh reconstruction component. Through experiments, we demonstrate the effectiveness of the mesh reconstruction component in producing high-quality scene meshes and their generality across different autonomous driving datasets. We also propose a detailed experiment plan for evaluating the entire pipeline, including both quantitative metrics in training state-of-the-art supervised and unsupervised semantic segmentation approaches and human perceptual studies, aiming to indicate the capability of our approach to generate realistic nighttime images and the value of our dataset in steering future progress in the field.

NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing

TL;DR

NPSim addresses the lack of realistic nighttime semantic segmentation data by proposing a physics-based pipeline that converts daytime images to nighttime using monocular inverse rendering and ray tracing. The core contribution is a Geometry Mesh Reconstruction component that leverages depth/normal estimation and Worldsheet to build accurate scene meshes, augmented by depth refinement and mesh post-processing to eliminate artifacts. Although relighting is described in detail, the thesis primarily implements the mesh reconstruction and provides a concrete plan for material prediction, probabilistic light activation, and ray-traced nighttime rendering, aiming to produce diverse nighttime datasets for training and evaluation. The approach promises improved robustness of vision systems in low-light conditions and generalizes across multiple driving datasets, while acknowledging limitations such as manual light-source masks and depth-normal domain shifts. Overall, NPSim offers a principled, 3D-consistent path to generate high-fidelity nighttime imagery that can steer future advances in nighttime semantic scene understanding.

Abstract

Semantic segmentation is an important task for autonomous driving. A powerful autonomous driving system should be capable of handling images under all conditions, including nighttime. Generating accurate and diverse nighttime semantic segmentation datasets is crucial for enhancing the performance of computer vision algorithms in low-light conditions. In this thesis, we introduce a novel approach named NPSim, which enables the simulation of realistic nighttime images from real daytime counterparts with monocular inverse rendering and ray tracing. NPSim comprises two key components: mesh reconstruction and relighting. The mesh reconstruction component generates an accurate representation of the scene structure by combining geometric information extracted from the input RGB image and semantic information from its corresponding semantic labels. The relighting component integrates real-world nighttime light sources and material characteristics to simulate the complex interplay of light and object surfaces under low-light conditions. The scope of this thesis mainly focuses on the implementation and evaluation of the mesh reconstruction component. Through experiments, we demonstrate the effectiveness of the mesh reconstruction component in producing high-quality scene meshes and their generality across different autonomous driving datasets. We also propose a detailed experiment plan for evaluating the entire pipeline, including both quantitative metrics in training state-of-the-art supervised and unsupervised semantic segmentation approaches and human perceptual studies, aiming to indicate the capability of our approach to generate realistic nighttime images and the value of our dataset in steering future progress in the field.

Paper Structure

This paper contains 33 sections, 10 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Method overview. Our pipeline contains two components. The Geometric Mesh Reconstruction component first utilizes network $\mathbf{F_g}$ to estimate the geometric information of an input RGB image, then reconstruct scene mesh based on depth using the Worldsheet hu2021worldsheet (A.II: Section \ref{['sec:gmr']}). It also contains a depth refinement kernel (A.I: Section \ref{['sec:drk']}) and a mesh post-process kernel (A.III: Section \ref{['sec:mpk']}) to optimize depth and mesh, respectively. The Realistic Nighttime Scene Relighting (B: Section \ref{['sec:rnsr']}) component first generates nighttime light sources using probabilistic light source activation. Then predict the material characteristics using network $\mathbf{F_{ir}}$. Following that, it uses ray tracing to render the linear nighttime clear image. Last, it processes the linear nighttime image to simulate artifacts and finally generates the output nighttime image $\mathbf{I_n}$. In this thesis, we implemented the Geometric Mesh Reconstruction component.
  • Figure 2: Annotation Statistics. We show the annotation statistics of 230 images nighttime reference images. The left image shows the number of instances of each type of light source, and the right image shows the number of pixels occupied by each type of light source. All results are presented in the base-10 logarithm.
  • Figure 3: Annotation Examples. We present several examples of our inactive light source annotation. The left column shows the input daytime RGB image, the middle column shows the annotated inactive light source mask, where each instance has its own identity and bounding box, and the right column superimposes the RGB image and the light source mask.
  • Figure 4: Depth comparison before and after the Dual-reference cross-bilateral filter. We present two examples for depth comparison. Column (a) shows the original RGB image, column (b) shows the depth estimated by iDisc piccinelli2023idisc, and column (c) shows the depth after dual-reference cross-bilateral filter optimization. The above comparison shows that the dual-reference cross-bilateral filter improves the depth estimation at the pixel level.
  • Figure 5: Uncertain region detected by the Dual-reference variance filter. Column (a) shows the original RGB image, column (b) shows the semantic annotation and column (c) shows the generated uncertain map, where the uncertain region is represented by the yellow region.
  • ...and 11 more figures