Table of Contents
Fetching ...

SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

Lingtong Kong, Bo Li, Yike Xiong, Hao Zhang, Hong Gu, Jinwei Chen

TL;DR

SAFNet addresses multi-frame HDR reconstruction under large motion and saturation by introducing a selective alignment fusion approach. It employs a pyramid encoder and a coarse-to-fine decoder that jointly refines cross-exposure motion fields and valuable-region masks, followed by an explicit HDR fusion with reweighted coefficients, and a lightweight refine module to recover details. A window partition cropping strategy and a newly released Challenge123 dataset support efficient training and robust evaluation on challenging motion/saturation scenarios. Empirical results show SAFNet achieves state-of-the-art accuracy with substantial speed advantages on public and proposed datasets, demonstrating practical applicability for resource-constrained devices. Overall, SAFNet advances HDR imaging by combining region-aware motion estimation with explicit fusion in a computationally efficient framework, accompanied by a challenging benchmark for future work.

Abstract

Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper, to achieve better efficiency, a novel Selective Alignment Fusion Network (SAFNet) for HDR imaging is proposed. After extracting pyramid features, it jointly refines valuable area masks and cross-exposure motion in selected regions with shared decoders, and then fuses high quality HDR image in an explicit way. This approach can focus the model on finding valuable regions while estimating their easily detectable and meaningful motion. For further detail enhancement, a lightweight refine module is introduced which enjoys privileges from previous optical flow, selection masks and initial prediction. Moreover, to facilitate learning on samples with large motion, a new window partition cropping method is presented during training. Experiments on public and newly developed challenging datasets show that proposed SAFNet not only exceeds previous SOTA competitors quantitatively and qualitatively, but also runs order of magnitude faster. Code and dataset is available at https://github.com/ltkong218/SAFNet.

SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

TL;DR

SAFNet addresses multi-frame HDR reconstruction under large motion and saturation by introducing a selective alignment fusion approach. It employs a pyramid encoder and a coarse-to-fine decoder that jointly refines cross-exposure motion fields and valuable-region masks, followed by an explicit HDR fusion with reweighted coefficients, and a lightweight refine module to recover details. A window partition cropping strategy and a newly released Challenge123 dataset support efficient training and robust evaluation on challenging motion/saturation scenarios. Empirical results show SAFNet achieves state-of-the-art accuracy with substantial speed advantages on public and proposed datasets, demonstrating practical applicability for resource-constrained devices. Overall, SAFNet advances HDR imaging by combining region-aware motion estimation with explicit fusion in a computationally efficient framework, accompanied by a challenging benchmark for future work.

Abstract

Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper, to achieve better efficiency, a novel Selective Alignment Fusion Network (SAFNet) for HDR imaging is proposed. After extracting pyramid features, it jointly refines valuable area masks and cross-exposure motion in selected regions with shared decoders, and then fuses high quality HDR image in an explicit way. This approach can focus the model on finding valuable regions while estimating their easily detectable and meaningful motion. For further detail enhancement, a lightweight refine module is introduced which enjoys privileges from previous optical flow, selection masks and initial prediction. Moreover, to facilitate learning on samples with large motion, a new window partition cropping method is presented during training. Experiments on public and newly developed challenging datasets show that proposed SAFNet not only exceeds previous SOTA competitors quantitatively and qualitatively, but also runs order of magnitude faster. Code and dataset is available at https://github.com/ltkong218/SAFNet.
Paper Structure (8 sections, 9 equations, 16 figures, 7 tables)

This paper contains 8 sections, 9 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Comparison on Kalantari 17 test dataset Kalantari_2017_ToG. Proposed SAFNet achieves state-of-the-art HDR imaging accuracy while with fast inference speed and small model size.
  • Figure 1: Details of our ISP simulation pipeline. The left part shows overall framework of the ISP pipeline for Qualcomm platform. The right part presents details of the Image Front End (IFE) in Qualcomm platform. We inject Bayer raw data before 'Pedestal Correction' and dump simulated LDR image before 'Global Tone Mapping'.
  • Figure 2: Overall architecture of our SAFNet. It contains a pyramid encoder, a coarse-to-fine decoder, and a refinement subnetwork. The linked switch selects path including window partition and window reverse during training, while skip them in evaluation.
  • Figure 2: Visual Comparison on Kalantari 17 test dataset Kalantari_2017_ToG. Zoom in for best view.
  • Figure 3: Details of the decoder $\mathcal{D}$ and the refine network $\mathcal{R}$ for SAFNet and SAFNet-S. Arguments of 'Conv' from left to right are input channels, output channels and dilation. All convolutions have 3$\times$3 kernel size. Stride is equal to dilation for each 'Conv'.
  • ...and 11 more figures