Table of Contents
Fetching ...

Large model enhanced computational ghost imaging

Yifan Chen, Hongjun An, Zhe Sun, Tong Tian, Mingliang Chen, Christian Spielmann, Xuelong Li

TL;DR

This paper tackles the challenge of reconstructing high-quality images from ghost imaging data under noise and scattering by introducing GILM, a large-scale imaging model with 1.4B parameters that embeds GI physics directly into a neural architecture. GILM combines skip connections (to stabilize deep training) with a Transformer2DModel to capture spatial dependencies across pixel points, operating in a latent space with a VAE decoder for efficient high-resolution reconstruction. Self-supervised learning aligns predicted speckle-intensity signals with collected measurements, removing the need for scene-specific labels and enhancing generalization. Extensive simulations and real-world tests in both free-space and underwater environments demonstrate that GILM outperforms classical and DL-based GI methods, maintaining high fidelity at low sampling and enabling deployment on portable hardware for practical GI applications.

Abstract

Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media. Recent investigations have established that deep learning (DL) can substantially enhance the ghost imaging reconstruction quality. Furthermore, with the emergence of large models like SDXL, GPT-4, etc., the constraints of conventional DL in parameters and architecture have been transcended, enabling models to comprehensively explore relationships among all distinct positions within feature sequences. This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery, making it particularly advantageous for noise-robust image reconstruction in GI applications. In this paper, we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI (GILM). The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures, ensuring sufficient parametric capacity to capture intricate correlations among object single-pixel measurements. Moreover, GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction, facilitating the extraction of comprehensive object information for subsequent reconstruction. We validated the effectiveness of GILM through a series of experiments, including simulated object imaging, imaging objects in free space, and imaging object located 52 meters away in underwater environment. The experimental results show that GILM effectively analyzes the fluctuation trends of the collected signals, thereby optimizing the recovery of the object's image from the acquired data.

Large model enhanced computational ghost imaging

TL;DR

This paper tackles the challenge of reconstructing high-quality images from ghost imaging data under noise and scattering by introducing GILM, a large-scale imaging model with 1.4B parameters that embeds GI physics directly into a neural architecture. GILM combines skip connections (to stabilize deep training) with a Transformer2DModel to capture spatial dependencies across pixel points, operating in a latent space with a VAE decoder for efficient high-resolution reconstruction. Self-supervised learning aligns predicted speckle-intensity signals with collected measurements, removing the need for scene-specific labels and enhancing generalization. Extensive simulations and real-world tests in both free-space and underwater environments demonstrate that GILM outperforms classical and DL-based GI methods, maintaining high fidelity at low sampling and enabling deployment on portable hardware for practical GI applications.

Abstract

Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media. Recent investigations have established that deep learning (DL) can substantially enhance the ghost imaging reconstruction quality. Furthermore, with the emergence of large models like SDXL, GPT-4, etc., the constraints of conventional DL in parameters and architecture have been transcended, enabling models to comprehensively explore relationships among all distinct positions within feature sequences. This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery, making it particularly advantageous for noise-robust image reconstruction in GI applications. In this paper, we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI (GILM). The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures, ensuring sufficient parametric capacity to capture intricate correlations among object single-pixel measurements. Moreover, GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction, facilitating the extraction of comprehensive object information for subsequent reconstruction. We validated the effectiveness of GILM through a series of experiments, including simulated object imaging, imaging objects in free space, and imaging object located 52 meters away in underwater environment. The experimental results show that GILM effectively analyzes the fluctuation trends of the collected signals, thereby optimizing the recovery of the object's image from the acquired data.

Paper Structure

This paper contains 18 sections, 3 equations, 9 figures.

Figures (9)

  • Figure 1: Overview of GILM. (a) The software framework of GILM algorithm. (b) The network structure of proposed large imaging model.
  • Figure 2: The network architecture of the modules in the proposed large imaging model. (a) The structure of ResNetBolck2D. (b) The structure of Transformer2DModel. (c) The structure of the multi-head attention mechanism embedded in the Transformer2DModel.
  • Figure 3: The results of retrieve the simulated binary and grayscale object images along with their corresponding PSNR and SSIM using the DGI, GICS, CNN-based GI, UNet-based GI, and GILM methods
  • Figure 4: The results of retrieve the simulated grayscale object images at different measurements along with their corresponding PSNR and SSIM using the DGI, GICS, CNN-based GI, UNet-based GI, and GILM methods
  • Figure 5: The experiment results of retrieve the USAF resolution target along with their grayscale distribution at the mark using the DGI, GICS, CNN-based GI, UNet-based GI, and GILM methods
  • ...and 4 more figures