Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou
TL;DR
This work tackles HDR video reconstruction from alternating-exposure sequences in real-world settings, addressing the lack of large-scale real data. It introduces Real-HDRV, a large real-world dataset with 500 LDRs-HDRs video pairs and diverse scenes and motions, and presents a two-stage alignment network consisting of a Global Alignment Module (GAM) and a Local Alignment Module (LAM) to handle global and local motion, respectively. GAM uses pre-defined offset bases to estimate global motion, while LAM employs a multi-scale pyramid and adaptive separable convolution to align features coherently; a fusion-reconstruction cascade then yields HDR video frames. Experiments show that models trained on Real-HDRV generalize better to real scenes than synthetic-trained models, and the proposed two-stage method achieves state-of-the-art HDR video reconstruction with favorable efficiency, highlighting Real-HDRV’s practical impact for real-world HDR video tasks.
Abstract
As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at https://github.com/yungsyu99/Real-HDRV.
