Table of Contents
Fetching ...

Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning

Xingchi Chen, Zhuoran Zheng, Xuerui Li, Yuying Chen, Shu Wang, Wenqi Ren

TL;DR

This work tackles UHD dynamic multi-exposure image fusion on resource-constrained hardware by introducing Infinite Pixel Learning (IPL), a chunk-cache-quantization pipeline inspired by long-sequence processing in LLMs. IPL leverages a Slice Cyclic Scanner for dimensional attention, an Attention Cache to avoid redundant computation, and Quantization Compression to manage memory, complemented by a Dimensional Rolling Transformation Module to preserve global context. The authors present the 4K-DMEF UHD benchmark and demonstrate that IPL achieves full-resolution UHD fusion on a single GPU at real-time speeds with substantial gains in PSNR, SSIM, and perceptual quality over state-of-the-art methods. This approach provides a practical path to high-quality UHD MEF on commodity hardware and offers a robust benchmark to accelerate future UHD dynamic fusion research.

Abstract

With the continuous improvement of device imaging resolution, the popularity of Ultra-High-Definition (UHD) images is increasing. Unfortunately, existing methods for fusing multi-exposure images in dynamic scenes are designed for low-resolution images, which makes them inefficient for generating high-quality UHD images on a resource-constrained device. To alleviate the limitations of extremely long-sequence inputs, inspired by the Large Language Model (LLM) for processing infinitely long texts, we propose a novel learning paradigm to achieve UHD multi-exposure dynamic scene image fusion on a single consumer-grade GPU, named Infinite Pixel Learning (IPL). The design of our approach comes from three key components: The first step is to slice the input sequences to relieve the pressure generated by the model processing the data stream; Second, we develop an attention cache technique, which is similar to KV cache for infinite data stream processing; Finally, we design a method for attention cache compression to alleviate the storage burden of the cache on the device. In addition, we provide a new UHD benchmark to evaluate the effectiveness of our method. Extensive experimental results show that our method maintains high-quality visual performance while fusing UHD dynamic multi-exposure images in real-time (>40fps) on a single consumer-grade GPU.

Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning

TL;DR

This work tackles UHD dynamic multi-exposure image fusion on resource-constrained hardware by introducing Infinite Pixel Learning (IPL), a chunk-cache-quantization pipeline inspired by long-sequence processing in LLMs. IPL leverages a Slice Cyclic Scanner for dimensional attention, an Attention Cache to avoid redundant computation, and Quantization Compression to manage memory, complemented by a Dimensional Rolling Transformation Module to preserve global context. The authors present the 4K-DMEF UHD benchmark and demonstrate that IPL achieves full-resolution UHD fusion on a single GPU at real-time speeds with substantial gains in PSNR, SSIM, and perceptual quality over state-of-the-art methods. This approach provides a practical path to high-quality UHD MEF on commodity hardware and offers a robust benchmark to accelerate future UHD dynamic fusion research.

Abstract

With the continuous improvement of device imaging resolution, the popularity of Ultra-High-Definition (UHD) images is increasing. Unfortunately, existing methods for fusing multi-exposure images in dynamic scenes are designed for low-resolution images, which makes them inefficient for generating high-quality UHD images on a resource-constrained device. To alleviate the limitations of extremely long-sequence inputs, inspired by the Large Language Model (LLM) for processing infinitely long texts, we propose a novel learning paradigm to achieve UHD multi-exposure dynamic scene image fusion on a single consumer-grade GPU, named Infinite Pixel Learning (IPL). The design of our approach comes from three key components: The first step is to slice the input sequences to relieve the pressure generated by the model processing the data stream; Second, we develop an attention cache technique, which is similar to KV cache for infinite data stream processing; Finally, we design a method for attention cache compression to alleviate the storage burden of the cache on the device. In addition, we provide a new UHD benchmark to evaluate the effectiveness of our method. Extensive experimental results show that our method maintains high-quality visual performance while fusing UHD dynamic multi-exposure images in real-time (>40fps) on a single consumer-grade GPU.

Paper Structure

This paper contains 23 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Model performance and efficiency comparison between our proposed IPL and other MEF methods on our proposed dataset. Since most methods are unable to process UHD images directly, the calculation is performed based on the maximum resolution U:16 that these algorithms can handle on a single GPU. Our method has approximately 46% higher PSNR and 48% higher SSIM than the second-best method, and the inference speed reaches real-time ($>$40fps), achieving an optimal trade-off between performance and efficiency.
  • Figure 2: The overall architecture of IPL, which extracts features using a series of Feature Integration Blocks (FIBs). The FIB mainly contains a Dimensional Attention Enhancement Module (DAEM) and a Dimensional Rolling Transformation Module (DRTM). DAEM has three key components: Slice Cyclic Scanner, Attention Cache Technique, and Quantization Compression, forming a chunk-cache-quantization pipeline to process infinite input pixels efficiently. DRTM associates features from different views by permuting feature maps to compensate for global features.
  • Figure 3: Qualitative comparison on our proposed 4K-DMEF dataset. All methods are trained using our training set on a single RTX 4090 GPU. Our method, IPL, outperforms all SOTA methods in processing UHD multi-exposure image inputs. It effectively avoids ghosting and blurring, achieving full-resolution inference with remarkable clarity and precision.
  • Figure 4: Visual comparison on the public non-UHD dataset. In the first comparison (left) with the Kalantari dataset Kalantari2017DeepHD, our IPL method shows competitive performance. In the second comparison (right) with the Mobile-HDR dataset Mobile-HDRcvpr23, our IPL method excels in detail restoration and performs well in relatively low-light conditions.
  • Figure 5: Visualization results of ablation Experiments. As shown, without DAEM, the model suffers severe performance degradation, failing to capture overall features and resulting in a slightly different color tone from the GT.