Table of Contents
Fetching ...

Sun Off, Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception

Konstantinos Tzevelekakis, Shutong Zhang, Luc Van Gool, Christos Sakaridis

TL;DR

This work tackles robust semantic perception at night by generating photorealistic synthetic nighttime images from a single daytime image using a monocular inverse-rendering + ray-tracing pipeline. It estimates scene geometry $\hat{d}$, normals $\hat{\bm{n}}$, materials $(a,r)$, and semantically informed light sources, then performs semantically-aware probabilistic light-source instantiation and physically-based relighting to render $I_n$, followed by ISP-like post-processing. Key contributions include the semantics-aware light-instantiation module, depth refinement guided by surface normals, and an end-to-end monocular nighttime synthesis pipeline, complemented by new datasets for outdoor light sources and nighttime illuminants. Empirical results show photorealistic nighttime outputs that improve day-to-night semantic adaptation in HRDA-based UDA on ACDC, though some 2D diffusion-based methods may surpass SOLO on mIoU under certain conditions due to brightness differences. Overall, SOLO advances realistic night perception and provides valuable datasets to the community for nocturnal scene understanding.

Abstract

Nighttime scenes are hard to semantically perceive with learned models and annotate for humans. Thus, realistic synthetic nighttime data become all the more important for learning robust semantic perception at night, thanks to their accurate and cheap semantic annotations. However, existing data-driven or hand-crafted techniques for generating nighttime images from daytime counterparts suffer from poor realism. The reason is the complex interaction of highly spatially varying nighttime illumination, which differs drastically from its daytime counterpart, with objects of spatially varying materials in the scene, happening in 3D and being very hard to capture with such 2D approaches. The above 3D interaction and illumination shift have proven equally hard to model in the literature, as opposed to other conditions such as fog or rain. Our method, named Sun Off, Lights On (SOLO), is the first to perform nighttime simulation on single images in a photorealistic fashion by operating in 3D. It first explicitly estimates the 3D geometry, the materials and the locations of light sources of the scene from the input daytime image and relights the scene by probabilistically instantiating light sources in a way that accounts for their semantics and then running standard ray tracing. Not only is the visual quality and photorealism of our nighttime images superior to competing approaches including diffusion models, but the former images are also proven more beneficial for semantic nighttime segmentation in day-to-night adaptation. Code and data will be made publicly available.

Sun Off, Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception

TL;DR

This work tackles robust semantic perception at night by generating photorealistic synthetic nighttime images from a single daytime image using a monocular inverse-rendering + ray-tracing pipeline. It estimates scene geometry , normals , materials , and semantically informed light sources, then performs semantically-aware probabilistic light-source instantiation and physically-based relighting to render , followed by ISP-like post-processing. Key contributions include the semantics-aware light-instantiation module, depth refinement guided by surface normals, and an end-to-end monocular nighttime synthesis pipeline, complemented by new datasets for outdoor light sources and nighttime illuminants. Empirical results show photorealistic nighttime outputs that improve day-to-night semantic adaptation in HRDA-based UDA on ACDC, though some 2D diffusion-based methods may surpass SOLO on mIoU under certain conditions due to brightness differences. Overall, SOLO advances realistic night perception and provides valuable datasets to the community for nocturnal scene understanding.

Abstract

Nighttime scenes are hard to semantically perceive with learned models and annotate for humans. Thus, realistic synthetic nighttime data become all the more important for learning robust semantic perception at night, thanks to their accurate and cheap semantic annotations. However, existing data-driven or hand-crafted techniques for generating nighttime images from daytime counterparts suffer from poor realism. The reason is the complex interaction of highly spatially varying nighttime illumination, which differs drastically from its daytime counterpart, with objects of spatially varying materials in the scene, happening in 3D and being very hard to capture with such 2D approaches. The above 3D interaction and illumination shift have proven equally hard to model in the literature, as opposed to other conditions such as fog or rain. Our method, named Sun Off, Lights On (SOLO), is the first to perform nighttime simulation on single images in a photorealistic fashion by operating in 3D. It first explicitly estimates the 3D geometry, the materials and the locations of light sources of the scene from the input daytime image and relights the scene by probabilistically instantiating light sources in a way that accounts for their semantics and then running standard ray tracing. Not only is the visual quality and photorealism of our nighttime images superior to competing approaches including diffusion models, but the former images are also proven more beneficial for semantic nighttime segmentation in day-to-night adaptation. Code and data will be made publicly available.
Paper Structure (19 sections, 6 equations, 5 figures, 4 tables)

This paper contains 19 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of SOLO. Our method accepts as input a single daytime image $I_d$. Geometric $(\hat{d}, \bm{n})$ and material $(a, r)$ representations are estimated with the inverse rendering networks $\mathcal{F}_G$ and $\mathcal{F}_M$, respectively. A light source segmentation network $\mathcal{F}_L$ predicts the regions in $I_d$ which correspond to inactive light sources that may emit light at night. The initial depth map $\hat{d}$ is filtered and optimized with guidance from an instance semantic segmentation mask and the estimated surface normal map $\bm{n}$, respectively. The refined depth map $d$ and the camera intrinsics are used to construct the 3D scene mesh $\mathcal{G}$. Nighttime light sources in the scene are instantiated probabilistically group-wise, using the predictions of $\mathcal{F}_L$ to sample their activation variables and the external real-world nighttime illuminants dataset we have collected to set their chromaticities. The activated light sources, the materials $(a, r)$ and the 3D mesh $\mathcal{G}$ are finally fed to the ray tracing module which renders a raw image that is subsequently post-processed in a standard fashion to compute the output nighttime image $I_n$.
  • Figure 2: Qualitative comparison of day-to-night translation methods. From left to right: daytime input images, and synthesized nighttime results of CycleGAN zhu2017unpaired, ControlNet zhang2023adding, FDA yang2020fda, and SOLO (ours).
  • Figure 3: Ablation study of SOLO. From column (a) - (d) the results of the ablated versions of SOLO i.e. 0, 1, 2, and 3 are presented. SOLO generated nighttime images are displayed in column (e). Notably, every sample (row) is labeled with a number.
  • Figure 4: Additional qualitative comparisons of day-to-night translation methods. From left to right: daytime input images, and synthesized nighttime results of CycleGAN zhu2017unpaired, ControlNet zhang2023adding, FDA yang2020fda, and SOLO (ours).
  • Figure 5: Nighttime Illuminants dataset samples. In the first row, the gray card images from which the average chromaticity coordinates $(x,y)$ are calculated are illustrated. The generated color palette (varying luminance) of the sampled chromaticity coordinates is displayed in the second row. Lastly, the $(x,y)$ coordinates are plotted on the CIE 1931 $2^\circ$ Standard Observer chromaticity diagram in the third row.