Table of Contents
Fetching ...

Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

Lingyu Zhu, Wenhan Yang, Baoliang Chen, Hanwei Zhu, Zhangkai Ni, Qi Mao, Shiqi Wang

TL;DR

This work tackles low-light video enhancement without paired ground truth by formulating restoration as a Maximum A Posteriori problem with explicit spatial and temporal priors and unrolling it into a deep network. The proposed Unrolled Decomposed Unpaired Network (UDU-Net) separates spatial and temporal degradation via Intra and Inter subnets, incorporating unpaired expert-retouched illumination, controllable human-perception feedback, and temporal cues to enforce consistency. The approach provides an interpretable, end-to-end trainable framework that outperforms existing unpaired methods and approaches several supervised baselines on realistic SDSD outdoor and indoor scenes, while maintaining temporal stability. The results demonstrate the practical value of reference-free, progressive spatial-temporal optimization for robust low-light video enhancement in diverse environments, with efficient inference suitable for real-time style processing.

Abstract

Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.

Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

TL;DR

This work tackles low-light video enhancement without paired ground truth by formulating restoration as a Maximum A Posteriori problem with explicit spatial and temporal priors and unrolling it into a deep network. The proposed Unrolled Decomposed Unpaired Network (UDU-Net) separates spatial and temporal degradation via Intra and Inter subnets, incorporating unpaired expert-retouched illumination, controllable human-perception feedback, and temporal cues to enforce consistency. The approach provides an interpretable, end-to-end trainable framework that outperforms existing unpaired methods and approaches several supervised baselines on realistic SDSD outdoor and indoor scenes, while maintaining temporal stability. The results demonstrate the practical value of reference-free, progressive spatial-temporal optimization for robust low-light video enhancement in diverse environments, with efficient inference suitable for real-time style processing.

Abstract

Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.
Paper Structure (19 sections, 16 equations, 6 figures, 3 tables)

This paper contains 19 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Visual comparison of the proposed method, existing methods, and the ground truth (highlighted in red box).
  • Figure 2: Illustration of the proposed Unrolled Decomposed Unpaired Network (UDU-Net), which is derived from the concept of unrolling and employs a stage-by-stage manner from spatial and temporal perspectives to effectively separate intertwined degradation. Both spatial and temporal visual priors are incorporated into the process, where the spatial prior incorporates unpaired retouched illumination and human perception feedback, while the temporal prior involves exploring temporal cues and utilizing guided side information.
  • Figure 3: Illustration of the results based on Eq. \ref{['gamma correction']}. The proposed method could adaptively select the target based on the BRISQUE score$\downarrow$.
  • Figure 4: Illusation of the estimated mask results computed using Eq. \ref{['estimated mask']}
  • Figure 5: Qualitative comparison results on SDSD outdoor dataset wang2021seeing. The supervised methods (*) are provided below the dashed line for reference.
  • ...and 1 more figures