Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction
Yang Zhang, Zhangkai Ni, Wenhan Yang, Hanli Wang
TL;DR
WMNet addresses HDR video reconstruction from LDR inputs by jointly tackling color fidelity and temporal inconsistency. It introduces Wavelet-domain Masked Image Modeling (W-MIM) with curriculum masking for robust color restoration, and augments temporal coherence with Temporal Mixture of Experts (T-MoE) and a scene-specific Dynamic Memory Module (DMM). The method is validated on a restructured HDRTV4K-Scene dataset and shows state-of-the-art performance across multiple metrics and strong generalization to RealHDRV, with favorable subjective user study results. These contributions provide a scalable approach for high-quality HDR video reconstruction with improved color accuracy and temporal stability, along with a practical scene-based benchmark for future research.
Abstract
High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality. The code is available at: https://github.com/eezkni/WMNet
