Table of Contents
Fetching ...

Inter-Image Pixel Shuffling for Multi-focus Image Fusion

Huangxing Lin, Rongrong Ma, Cheng Wang

TL;DR

Inter-image Pixel Shuffling (IPS), a novel method that allows neural networks to learn multi-focus image fusion without requiring actual multi-focus images, is introduced.

Abstract

Multi-focus image fusion aims to combine multiple partially focused images into a single all-in-focus image. Although deep learning has shown promise in this task, its effectiveness is often limited by the scarcity of suitable training data. This paper introduces Inter-image Pixel Shuffling (IPS), a novel method that allows neural networks to learn multi-focus image fusion without requiring actual multi-focus images. IPS reformulates the task as a pixel-wise classification problem, where the goal is to identify the focused pixel from a pixel group at each spatial position. In this method, pixels from a clear optical image are treated as focused, while pixels from a low-pass filtered version of the same image are considered defocused. By randomly shuffling the focused and defocused pixels at identical spatial positions in the original and filtered images, IPS generates training data that preserves spatial structure while mixing focus-defocus information. The model is trained to select the focused pixel from each spatially aligned pixel group, thus learning to reconstruct an all-in-focus image by aggregating sharp content from the input. To further enhance fusion quality, IPS adopts a cross-image fusion network that integrates the localized representation power of convolutional neural networks with the long-range modeling capabilities of state space models. This design effectively leverages both spatial detail and contextual information to produce high-quality fused results. Experimental results indicate that IPS significantly outperforms existing multi-focus image fusion methods, even without training on multi-focus images.

Inter-Image Pixel Shuffling for Multi-focus Image Fusion

TL;DR

Inter-image Pixel Shuffling (IPS), a novel method that allows neural networks to learn multi-focus image fusion without requiring actual multi-focus images, is introduced.

Abstract

Multi-focus image fusion aims to combine multiple partially focused images into a single all-in-focus image. Although deep learning has shown promise in this task, its effectiveness is often limited by the scarcity of suitable training data. This paper introduces Inter-image Pixel Shuffling (IPS), a novel method that allows neural networks to learn multi-focus image fusion without requiring actual multi-focus images. IPS reformulates the task as a pixel-wise classification problem, where the goal is to identify the focused pixel from a pixel group at each spatial position. In this method, pixels from a clear optical image are treated as focused, while pixels from a low-pass filtered version of the same image are considered defocused. By randomly shuffling the focused and defocused pixels at identical spatial positions in the original and filtered images, IPS generates training data that preserves spatial structure while mixing focus-defocus information. The model is trained to select the focused pixel from each spatially aligned pixel group, thus learning to reconstruct an all-in-focus image by aggregating sharp content from the input. To further enhance fusion quality, IPS adopts a cross-image fusion network that integrates the localized representation power of convolutional neural networks with the long-range modeling capabilities of state space models. This design effectively leverages both spatial detail and contextual information to produce high-quality fused results. Experimental results indicate that IPS significantly outperforms existing multi-focus image fusion methods, even without training on multi-focus images.
Paper Structure (20 sections, 7 equations, 9 figures, 6 tables)

This paper contains 20 sections, 7 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Cartoon schematic of multi-focus image fusion. Red boxes indicate focused pixels, while blue boxes represent defocused pixels.
  • Figure 2: Illustration of pixel shuffling. Red boxes indicate focused pixels, while blue boxes represent defocused pixels. (a) Applying pixel shuffling to the multi-focus images yields a pair of recombined images. Since both the multi-focus and recombined image pairs share the same pixels, they correspond to the same all-in-focus image. (b) Filtering and pixel shuffling a single optical image can produce recombined images identical to those in (a). These recombined images can thus serve as substitutes for learning multi-focus image fusion, with the optical image in (b) used as the ground-truth.
  • Figure 3: IPS training pipeline. Red boxes indicate focused pixels, while blue boxes represent defocused pixels. The mask $m$ shares the same dimensions as the image.
  • Figure 4: Architecture of the Cross-Image Fusion Network.
  • Figure 5: Visual comparison of fused images on the Lytro dataset. Corresponding difference maps with respect to Source Image 1 are also presented.
  • ...and 4 more figures