NPSim: Nighttime Photorealistic Simulation From Daytime Images With Monocular Inverse Rendering and Ray Tracing
Shutong Zhang
TL;DR
NPSim addresses the lack of realistic nighttime semantic segmentation data by proposing a physics-based pipeline that converts daytime images to nighttime using monocular inverse rendering and ray tracing. The core contribution is a Geometry Mesh Reconstruction component that leverages depth/normal estimation and Worldsheet to build accurate scene meshes, augmented by depth refinement and mesh post-processing to eliminate artifacts. Although relighting is described in detail, the thesis primarily implements the mesh reconstruction and provides a concrete plan for material prediction, probabilistic light activation, and ray-traced nighttime rendering, aiming to produce diverse nighttime datasets for training and evaluation. The approach promises improved robustness of vision systems in low-light conditions and generalizes across multiple driving datasets, while acknowledging limitations such as manual light-source masks and depth-normal domain shifts. Overall, NPSim offers a principled, 3D-consistent path to generate high-fidelity nighttime imagery that can steer future advances in nighttime semantic scene understanding.
Abstract
Semantic segmentation is an important task for autonomous driving. A powerful autonomous driving system should be capable of handling images under all conditions, including nighttime. Generating accurate and diverse nighttime semantic segmentation datasets is crucial for enhancing the performance of computer vision algorithms in low-light conditions. In this thesis, we introduce a novel approach named NPSim, which enables the simulation of realistic nighttime images from real daytime counterparts with monocular inverse rendering and ray tracing. NPSim comprises two key components: mesh reconstruction and relighting. The mesh reconstruction component generates an accurate representation of the scene structure by combining geometric information extracted from the input RGB image and semantic information from its corresponding semantic labels. The relighting component integrates real-world nighttime light sources and material characteristics to simulate the complex interplay of light and object surfaces under low-light conditions. The scope of this thesis mainly focuses on the implementation and evaluation of the mesh reconstruction component. Through experiments, we demonstrate the effectiveness of the mesh reconstruction component in producing high-quality scene meshes and their generality across different autonomous driving datasets. We also propose a detailed experiment plan for evaluating the entire pipeline, including both quantitative metrics in training state-of-the-art supervised and unsupervised semantic segmentation approaches and human perceptual studies, aiming to indicate the capability of our approach to generate realistic nighttime images and the value of our dataset in steering future progress in the field.
