Table of Contents
Fetching ...

Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode

Xin Su, Zhuoran Zheng

TL;DR

The paper tackles real-time multi-exposure image fusion (MEF) for ultra-high-definition (UHD) imagery on resource-constrained devices. It introduces a distillation-based framework that learns a robust, editable $3\mathrm{D}$ LUT grid via a teacher–student network, with an implicit neural representation enabling flexible grid sizes. The method uses a lightweight architecture (approximately $0.52\mathrm{M}$ parameters) and achieves real-time performance of about $33$ frames per second on a single GPU, validated on SICE, NTIRE HDR, and MED UHD datasets, including mobile deployment. Key contributions include modeling input uncertainty with a distilled LUT, enabling editable grids, and comprehensive ablations showing the roles of the teacher, the long-range regularizer, and grid size; results demonstrate favorable efficiency and accuracy for UHD MEF. This work offers a practical, configurable approach for UHD image enhancement in real-world scenarios, bridging high-quality HDR fusion with deployment-ready speed.

Abstract

With the rising imaging resolution of handheld devices, existing multi-exposure image fusion algorithms struggle to generate a high dynamic range image with ultra-high resolution in real-time. Apart from that, there is a trend to design a manageable and editable algorithm as the different needs of real application scenarios. To tackle these issues, we introduce 3D LUT technology, which can enhance images with ultra-high-definition (UHD) resolution in real time on resource-constrained devices. However, since the fusion of information from multiple images with different exposure rates is uncertain, and this uncertainty significantly trials the generalization power of the 3D LUT grid. To address this issue and ensure a robust learning space for the model, we propose using a teacher-student network to model the uncertainty on the 3D LUT grid.Furthermore, we provide an editable mode for the multi-exposure image fusion algorithm by using the implicit representation function to match the requirements in different scenarios. Extensive experiments demonstrate that our proposed method is highly competitive in efficiency and accuracy.

Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode

TL;DR

The paper tackles real-time multi-exposure image fusion (MEF) for ultra-high-definition (UHD) imagery on resource-constrained devices. It introduces a distillation-based framework that learns a robust, editable LUT grid via a teacher–student network, with an implicit neural representation enabling flexible grid sizes. The method uses a lightweight architecture (approximately parameters) and achieves real-time performance of about frames per second on a single GPU, validated on SICE, NTIRE HDR, and MED UHD datasets, including mobile deployment. Key contributions include modeling input uncertainty with a distilled LUT, enabling editable grids, and comprehensive ablations showing the roles of the teacher, the long-range regularizer, and grid size; results demonstrate favorable efficiency and accuracy for UHD MEF. This work offers a practical, configurable approach for UHD image enhancement in real-world scenarios, bridging high-quality HDR fusion with deployment-ready speed.

Abstract

With the rising imaging resolution of handheld devices, existing multi-exposure image fusion algorithms struggle to generate a high dynamic range image with ultra-high resolution in real-time. Apart from that, there is a trend to design a manageable and editable algorithm as the different needs of real application scenarios. To tackle these issues, we introduce 3D LUT technology, which can enhance images with ultra-high-definition (UHD) resolution in real time on resource-constrained devices. However, since the fusion of information from multiple images with different exposure rates is uncertain, and this uncertainty significantly trials the generalization power of the 3D LUT grid. To address this issue and ensure a robust learning space for the model, we propose using a teacher-student network to model the uncertainty on the 3D LUT grid.Furthermore, we provide an editable mode for the multi-exposure image fusion algorithm by using the implicit representation function to match the requirements in different scenarios. Extensive experiments demonstrate that our proposed method is highly competitive in efficiency and accuracy.

Paper Structure

This paper contains 6 sections, 12 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The top figure shows the results of our method run on a multi-exposure dataset. Size illustrates the scale of the 3D LUT grid, for example, 32 means the size of the grid is 32 $\times$ 32 $\times$ 32, and the PSNR is gradually increased with the size of the grid. Note that the scale of this grid is editable due to our use of implicit neural representation. The bottom figure shows the average running time comparison over MEFB, a dataset containing 50 image pairs of average size $3 \times 551 \times 707$.
  • Figure 2: The architecture of our approach. This figure shows a learning paradigm for a student-teacher network. First, the teacher network learns a high-quality 3D LUT, after that, the 3D LUT in the student network is constrained by the teacher network, and finally, the student network generates a robust 3D LUT. F denotes the weighted fusion strategy and L denotes the restricted loss term.
  • Figure 3: The architecture of implicit neural network.
  • Figure 4: Our method obtains better visual quality and recovers more image details compared with other state-of-the-art methods in the SICE datasets.
  • Figure 5: Our method obtains better visual quality and more image details compared with other state-of-the-art methods in the NTIRE22 HDR datasets.
  • ...and 6 more figures