Table of Contents
Fetching ...

LoCAtion: Long-time Collaborative Attention Framework for High Dynamic Range Video Reconstruction

Qianyu Zhang, Bolun Zheng, Lingyu Zhu, Aiai Huang, Zongpeng Li, Shiqi Wang

Abstract

Prevailing High Dynamic Range (HDR) video reconstruction methods are fundamentally trapped in a fragile alignment-and-fusion paradigm. While explicit spatial alignment can successfully recover fine details in controlled environments, it becomes a severe bottleneck in unconstrained dynamic scenes. By forcing rigid alignment across unpredictable motions and varying exposures, these methods inevitably translate registration errors into severe ghosting artifacts and temporal flickering. In this paper, we rethink this conventional prerequisite. Recognizing that explicit alignment is inherently vulnerable to real-world complexities, we propose LoCAtion, a Long-time Collaborative Attention framework that reformulates HDR video generation from a fragile spatial warping task into a robust, alignment-free collaborative feature routing problem. Guided by this new formulation, our architecture explicitly decouples the highly entangled reconstruction task. Rather than struggling to rigidly warp neighboring frames, we anchor the scene on a continuous medium-exposure backbone and utilize collaborative attention to dynamically harvest and inject reliable irradiance cues from unaligned exposures. Furthermore, we introduce a learned global sequence solver. By leveraging bidirectional context and long-range temporal modeling, it propagates corrective signals and structural features across the entire sequence, inherently enforcing whole-video coherence and eliminating jitter. Extensive experiments demonstrate that LoCAtion achieves state-of-the-art visual quality and temporal stability, offering a highly competitive balance between accuracy and computational efficiency.

LoCAtion: Long-time Collaborative Attention Framework for High Dynamic Range Video Reconstruction

Abstract

Prevailing High Dynamic Range (HDR) video reconstruction methods are fundamentally trapped in a fragile alignment-and-fusion paradigm. While explicit spatial alignment can successfully recover fine details in controlled environments, it becomes a severe bottleneck in unconstrained dynamic scenes. By forcing rigid alignment across unpredictable motions and varying exposures, these methods inevitably translate registration errors into severe ghosting artifacts and temporal flickering. In this paper, we rethink this conventional prerequisite. Recognizing that explicit alignment is inherently vulnerable to real-world complexities, we propose LoCAtion, a Long-time Collaborative Attention framework that reformulates HDR video generation from a fragile spatial warping task into a robust, alignment-free collaborative feature routing problem. Guided by this new formulation, our architecture explicitly decouples the highly entangled reconstruction task. Rather than struggling to rigidly warp neighboring frames, we anchor the scene on a continuous medium-exposure backbone and utilize collaborative attention to dynamically harvest and inject reliable irradiance cues from unaligned exposures. Furthermore, we introduce a learned global sequence solver. By leveraging bidirectional context and long-range temporal modeling, it propagates corrective signals and structural features across the entire sequence, inherently enforcing whole-video coherence and eliminating jitter. Extensive experiments demonstrate that LoCAtion achieves state-of-the-art visual quality and temporal stability, offering a highly competitive balance between accuracy and computational efficiency.
Paper Structure (14 sections, 8 equations, 11 figures, 5 tables)

This paper contains 14 sections, 8 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Speed-performance trade-off comparison with recent state-of-the-art HDR methods on the Cinematic Video dataset. Our method achieves state-of-the-art reconstruction quality while maintaining a competitive accuracy-latency trade-off.
  • Figure 2: An overview of the proposed LoCAtion framework. Our alignment-free architecture naturally decouples the HDR video reconstruction task into two synergistic stages. First, the Collaborative Feature Attention stage operates on a continuous medium-exposure backbone to perform motion-aware collaborative attention, reliably integrating unaligned multi-exposure features to recover local dynamic range. Second, the Global Sequence Consistency stage acts as a sequence-level solver. By leveraging bidirectional context and long-range temporal modeling to explicitly broadcast corrective and structural information, it enforces whole-sequence consistency and effectively eliminates visual flicker.
  • Figure 3: Visual comparisons on POKER FULLSHOT sequences. Image deghosting methods (e.g., AHRNet, DomainPlus) leave residual ghosting in saturated highlight regions (red box) and blur in dark areas due to insufficient detail recovery (blue box), while video reconstruction methods (e.g., LAN-HDR, HDRFlow) reduce ghosts but introduce noticeable noise and grain. Our method reduces both ghosting and noise, preserving fine structures and contrast in both highlights and shadows. Zoom in for details.
  • Figure 4: Visual comparisons on CAROUSEL FIREWORKS sequences. In fast-motion scenes, video reconstruction baselines (e.g., LAN-HDR/HDRFlow) often break down, exhibiting motion tearing. Image deghosting approaches (e.g., AHRNet, DomainPlus) tend to leave color fringing and halos around moving light trails. Our LoCAtion better handles rapid motion, markedly suppressing ghosting and color fringing.
  • Figure 5: Visual comparisons on public real-world videos.
  • ...and 6 more figures