Table of Contents
Fetching ...

PASTA: Towards Flexible and Efficient HDR Imaging Via Progressively Aggregated Spatio-Temporal Alignment

Xiaoning Liu, Ao Li, Zongwei Wu, Yapeng Du, Le Zhang, Yulun Zhang, Radu Timofte, Ce Zhu

TL;DR

This work targets the practical challenge of HDR deghosting under high-resolution capture with motion and exposure variation. It introduces PASTA, a Progressive Aggregated Spatio-Temporal Alignment framework that harnesses a wavelet-based hierarchical representation and a coarse-to-fine fusion strategy, augmented by an Inter-Frame Temporal Attention module. The approach achieves state-of-the-art HDR quality with substantial efficiency gains, reporting up to threefold faster inference and enabling 2K HDR processing on standard GPUs, with an ultra-light variant offering even larger speedups. These findings demonstrate that hierarchical, wavelet-based representations combined with progressive cross-scale fusion can deliver high-fidelity, ghost-free HDR images at high resolutions with practical deployment potential.

Abstract

Leveraging Transformer attention has led to great advancements in HDR deghosting. However, the intricate nature of self-attention introduces practical challenges, as existing state-of-the-art methods often demand high-end GPUs or exhibit slow inference speeds, especially for high-resolution images like 2K. Striking an optimal balance between performance and latency remains a critical concern. In response, this work presents PASTA, a novel Progressively Aggregated Spatio-Temporal Alignment framework for HDR deghosting. Our approach achieves effectiveness and efficiency by harnessing hierarchical representation during feature distanglement. Through the utilization of diverse granularities within the hierarchical structure, our method substantially boosts computational speed and optimizes the HDR imaging workflow. In addition, we explore within-scale feature modeling with local and global attention, gradually merging and refining them in a coarse-to-fine fashion. Experimental results showcase PASTA's superiority over current SOTA methods in both visual quality and performance metrics, accompanied by a substantial 3-fold (x3) increase in inference speed.

PASTA: Towards Flexible and Efficient HDR Imaging Via Progressively Aggregated Spatio-Temporal Alignment

TL;DR

This work targets the practical challenge of HDR deghosting under high-resolution capture with motion and exposure variation. It introduces PASTA, a Progressive Aggregated Spatio-Temporal Alignment framework that harnesses a wavelet-based hierarchical representation and a coarse-to-fine fusion strategy, augmented by an Inter-Frame Temporal Attention module. The approach achieves state-of-the-art HDR quality with substantial efficiency gains, reporting up to threefold faster inference and enabling 2K HDR processing on standard GPUs, with an ultra-light variant offering even larger speedups. These findings demonstrate that hierarchical, wavelet-based representations combined with progressive cross-scale fusion can deliver high-fidelity, ghost-free HDR images at high resolutions with practical deployment potential.

Abstract

Leveraging Transformer attention has led to great advancements in HDR deghosting. However, the intricate nature of self-attention introduces practical challenges, as existing state-of-the-art methods often demand high-end GPUs or exhibit slow inference speeds, especially for high-resolution images like 2K. Striking an optimal balance between performance and latency remains a critical concern. In response, this work presents PASTA, a novel Progressively Aggregated Spatio-Temporal Alignment framework for HDR deghosting. Our approach achieves effectiveness and efficiency by harnessing hierarchical representation during feature distanglement. Through the utilization of diverse granularities within the hierarchical structure, our method substantially boosts computational speed and optimizes the HDR imaging workflow. In addition, we explore within-scale feature modeling with local and global attention, gradually merging and refining them in a coarse-to-fine fashion. Experimental results showcase PASTA's superiority over current SOTA methods in both visual quality and performance metrics, accompanied by a substantial 3-fold (x3) increase in inference speed.
Paper Structure (18 sections, 4 equations, 6 figures, 6 tables)

This paper contains 18 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Visual and quantitative comparison. Our method consistently outperforms in two benchmark HDR datasets, urpassing the state-of-the-art CA-ViT liu2022ghost and NHDRRnet yan2020deep (the hierarchical method) methods in preserving content integrity, achieving ghost-free and high-fidelity results. Zoom in to see more details.
  • Figure 2: Framework Overview. The proposed framework mainly consists of three stages, i.e., shallow feature extraction, inter-frame temporal attention, and hierarchical representation & progressive aggregation.
  • Figure 3: (a)-(c) Feature Channels Correlation between subband coefficients of three levels of wavelet decomposition measured by Pearson correlation coefficient (PCCs). The channel dimension of each subband coefficient is 48. In each subfigure, from the top left to the bottom right, the correlations among the channels of the subband coefficients $\left\{ {L{L_k},L{H_k},H{L_k},H{H_k}} \right\}_{k = 1}^3$ themselves are shown. (d)-(e) Dependencies between wavelet coefficients across different scales give rise to a quad-tree structure, where not only the parents but also the children coefficients contain pertinent information.
  • Figure 4: Visual comparison of large-scale foreground and dense background motion with SOTA methods on Kalantari et al.'s dataset kalantari2017deep.
  • Figure 5: Visual comparison of large disparities with SOTA methods on Tel et al.'s dataset tel2023alignment. The bottom row displays 1D intensity shift of the green close-up regions labeled with the line. Zoom in for better viewing.
  • ...and 1 more figures