Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

Zhilu Zhang; Haoyu Wang; Shuai Liu; Xiaotao Wang; Lei Lei; Wangmeng Zuo

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

Zhilu Zhang, Haoyu Wang, Shuai Liu, Xiaotao Wang, Lei Lei, Wangmeng Zuo

TL;DR

The paper tackles ghosting in HDR reconstruction for dynamic scenes under limited data by proposing SelfHDR, a self-supervised framework that learns HDR reconstruction through dual supervision: a color component derived from aligned multi-exposure frames and a structure component learned by a structure-focused network guided by a reference image. This decomposition allows training directly on real dynamic multi-exposure data without HDR ground-truth and yields HDR predictions at inference time using a single reconstruction network. Experiments on real-world data show SelfHDR outperforms prior self-supervised methods and approaches the performance of supervised HDR methods, with clear improvements in ghosting suppression and texture quality. The work advances practical HDR imaging in dynamic scenes and provides a path toward reducing data collection costs, complemented by extensive ablations and public code.

Abstract

Merging multi-exposure images is a common approach for obtaining high dynamic range (HDR) images, with the primary challenge being the avoidance of ghosting artifacts in dynamic scenes. Recent methods have proposed using deep neural networks for deghosting. However, the methods typically rely on sufficient data with HDR ground-truths, which are difficult and costly to collect. In this work, to eliminate the need for labeled data, we propose SelfHDR, a self-supervised HDR reconstruction method that only requires dynamic multi-exposure images during training. Specifically, SelfHDR learns a reconstruction network under the supervision of two complementary components, which can be constructed from multi-exposure images and focus on HDR color as well as structure, respectively. The color component is estimated from aligned multi-exposure images, while the structure one is generated through a structure-focused network that is supervised by the color component and an input reference (\eg, medium-exposure) image. During testing, the learned reconstruction network is directly deployed to predict an HDR image. Experiments on real-world images demonstrate our SelfHDR achieves superior results against the state-of-the-art self-supervised methods, and comparable performance to supervised ones. Codes are available at https://github.com/cszhilu1998/SelfHDR

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

TL;DR

Abstract

Paper Structure (25 sections, 15 equations, 11 figures, 10 tables)

This paper contains 25 sections, 15 equations, 11 figures, 10 tables.

Introduction
Related Work
Supervised HDR Imaging with Multi-Exposure Images
Few-Shot and Self-Supervised HDR Imaging with Multi-Exposure Images
Method
Motivation and Overview
Constructing Color and Structure Components
Constructing Color Component
Constructing Structure Component
Learning HDR with Color and Structure Components
Experiments
Implementation Details
Comparison with State-of-the-Arts
Ablation Study
Effect of Color and Structure Supervision
...and 10 more sections

Figures (11)

Figure 1: The triangle function that we use as the blending weights to generate color components.
Figure 2: Overview of SelfHDR. During training, we first construct color and structure components (i.e., $\bm{Y}_{color}$ and $\bm{Y}_{stru}$), then take $\bm{Y}_{color}$ and $\bm{Y}_{stru}$ for supervising the HDR reconstruction network. During testing, the HDR reconstruction network can be used to predict HDR images from unseen multi-exposure images. Dotted lines with different colors represent different loss terms.
Figure 3: Visual comparison on Kalantari et al. dataset Kalantari17. Red and blue arrows indicate areas with ghosting artifacts from other methods. 'HDR-Tra.' denotes HDR-Transformer.
Figure 3: Effect of loss terms ($\mathcal{L}_{se}$ and $\mathcal{L}_{sp}$) when training structure-focused network.
Figure 4: Visual comparison on (a) Sen et al. sen2012robust and (b) Tursun et al. tursun2016objective datasets. Red arrows indicate areas with poor quality from other methods.
...and 6 more figures

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

TL;DR

Abstract

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

Authors

TL;DR

Abstract

Table of Contents

Figures (11)