Table of Contents
Fetching ...

Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

Haolin Yang, Chaoqiang Zhao, Lu Sheng, Yang Tang

TL;DR

The paper tackles the challenge of nighttime monocular depth estimation by demonstrating that training exclusively on day images can generalize to night when day-to-night differences are compensated through physically grounded priors. It introduces a data distribution compensation framework using Brightness Peak Generator (BPG) and Imaging Noise Generator (ING) to synthesize night-like inputs from daytime data, and it trains in a self-supervised manner with an ICN component to handle illumination changes. Key contributions include no nighttime images used in training, explicit modeling of photometric and noise distributions, and SoTA results on nuScenes-Night and RobotCar-Night under a common Backbone. The approach offers a practical and principled pathway to robust nighttime monocular depth estimation with efficient training and improved generalization to unseen night scenes.

Abstract

Nighttime self-supervised monocular depth estimation has received increasing attention in recent years. However, using night images for self-supervision is unreliable because the photometric consistency assumption is usually violated in the videos taken under complex lighting conditions. Even with domain adaptation or photometric loss repair, performance is still limited by the poor supervision of night images on trainable networks. In this paper, we propose a self-supervised nighttime monocular depth estimation method that does not use any night images during training. Our framework utilizes day images as a stable source for self-supervision and applies physical priors (e.g., wave optics, reflection model and read-shot noise model) to compensate for some key day-night differences. With day-to-night data distribution compensation, our framework can be trained in an efficient one-stage self-supervised manner. Though no nighttime images are considered during training, qualitative and quantitative results demonstrate that our method achieves SoTA depth estimating results on the challenging nuScenes-Night and RobotCar-Night compared with existing methods.

Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

TL;DR

The paper tackles the challenge of nighttime monocular depth estimation by demonstrating that training exclusively on day images can generalize to night when day-to-night differences are compensated through physically grounded priors. It introduces a data distribution compensation framework using Brightness Peak Generator (BPG) and Imaging Noise Generator (ING) to synthesize night-like inputs from daytime data, and it trains in a self-supervised manner with an ICN component to handle illumination changes. Key contributions include no nighttime images used in training, explicit modeling of photometric and noise distributions, and SoTA results on nuScenes-Night and RobotCar-Night under a common Backbone. The approach offers a practical and principled pathway to robust nighttime monocular depth estimation with efficient training and improved generalization to unseen night scenes.

Abstract

Nighttime self-supervised monocular depth estimation has received increasing attention in recent years. However, using night images for self-supervision is unreliable because the photometric consistency assumption is usually violated in the videos taken under complex lighting conditions. Even with domain adaptation or photometric loss repair, performance is still limited by the poor supervision of night images on trainable networks. In this paper, we propose a self-supervised nighttime monocular depth estimation method that does not use any night images during training. Our framework utilizes day images as a stable source for self-supervision and applies physical priors (e.g., wave optics, reflection model and read-shot noise model) to compensate for some key day-night differences. With day-to-night data distribution compensation, our framework can be trained in an efficient one-stage self-supervised manner. Though no nighttime images are considered during training, qualitative and quantitative results demonstrate that our method achieves SoTA depth estimating results on the challenging nuScenes-Night and RobotCar-Night compared with existing methods.
Paper Structure (58 sections, 24 equations, 16 figures, 6 tables)

This paper contains 58 sections, 24 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1.1: Nighttime monocular estimation results of different self-supervised frameworks. Compared with existing domain adaptation-based methods RNW rnw and ADDS adds and recent large model MonoFormer monoformer, our result shows superior performance.
  • Figure 2.1: Overview of our data distribution compensation training framework. The proposed BPG and ING form our compensation stage, whose simple processes are also visualized in the top right. Note that BPG and ING will not participate in the backward propagation. Their input and output are detached. The Transformer-CNN hybrid DepthNet, CNN-based PoseNet and CNN-based Illuminating Change Net (ICN) constitute the trainable part of our framework. The two DepthNets share the same weights during training and the left one are frozen during the whole training. The images inputting of the pose network part and the loss part will not be pre-processed by BPG and ING, we discuss this setting in supplementary.
  • Figure 3.1: Example visualizations of Re-rendering. Top: Reflection images. Bottom: Re-rendered images.
  • Figure 3.2: Paired visual examples. (Best view with zoom.)
  • Figure 4.1: Qualitative results on nuScenes-Night (First four columns) and RobotCar-Night (Last four columns). We leave more visual comparisons to the supplementary material. Compare to DA types methods, our training applies no images from the nuScenes or Oxford RobotCar datasets.
  • ...and 11 more figures