UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue
TL;DR
UltraFusion reframes exposure fusion as guided inpainting to fuse images with extreme exposure differences (up to 9 stops) by using the under-exposed image as soft guidance and diffusion priors to produce natural, tone-mapped outputs. The method employs a two-stage pipeline—pre-alignment and guided inpainting—augmented with a decompose-and-fuse control branch and a fidelity control branch to ensure robust fusion under misalignment and motion. A novel training data synthesis pipeline enables learning from dynamic scenes, and extensive experiments on static and dynamic HDR benchmarks, plus a new UltraFusion benchmark, show superior performance over state-of-the-art HDR methods. The approach enables robust, artifact-free ultra-high dynamic HDR imaging with practical 2-shot captures, though runtime could be improved with faster alignment and diffusion strategies.
Abstract
Capturing high dynamic range (HDR) scenes is one of the most important issues in camera design. Majority of cameras use exposure fusion, which fuses images captured by different exposure levels, to increase dynamic range. However, this approach can only handle images with limited exposure difference, normally 3-4 stops. When applying to very high dynamic range scenes where a large exposure difference is required, this approach often fails due to incorrect alignment or inconsistent lighting between inputs, or tone mapping artifacts. In this work, we propose \model, the first exposure fusion technique that can merge inputs with 9 stops differences. The key idea is that we model exposure fusion as a guided inpainting problem, where the under-exposed image is used as a guidance to fill the missing information of over-exposed highlights in the over-exposed region. Using an under-exposed image as a soft guidance, instead of a hard constraint, our model is robust to potential alignment issue or lighting variations. Moreover, by utilizing the image prior of the generative model, our model also generates natural tone mapping, even for very high-dynamic range scenes. Our approach outperforms HDR-Transformer on latest HDR benchmarks. Moreover, to test its performance in ultra high dynamic range scenes, we capture a new real-world exposure fusion benchmark, UltraFusion dataset, with exposure differences up to 9 stops, and experiments show that UltraFusion can generate beautiful and high-quality fusion results under various scenarios. Code and data will be available at https://openimaginglab.github.io/UltraFusion.
