Large model enhanced computational ghost imaging
Yifan Chen, Hongjun An, Zhe Sun, Tong Tian, Mingliang Chen, Christian Spielmann, Xuelong Li
TL;DR
This paper tackles the challenge of reconstructing high-quality images from ghost imaging data under noise and scattering by introducing GILM, a large-scale imaging model with 1.4B parameters that embeds GI physics directly into a neural architecture. GILM combines skip connections (to stabilize deep training) with a Transformer2DModel to capture spatial dependencies across pixel points, operating in a latent space with a VAE decoder for efficient high-resolution reconstruction. Self-supervised learning aligns predicted speckle-intensity signals with collected measurements, removing the need for scene-specific labels and enhancing generalization. Extensive simulations and real-world tests in both free-space and underwater environments demonstrate that GILM outperforms classical and DL-based GI methods, maintaining high fidelity at low sampling and enabling deployment on portable hardware for practical GI applications.
Abstract
Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media. Recent investigations have established that deep learning (DL) can substantially enhance the ghost imaging reconstruction quality. Furthermore, with the emergence of large models like SDXL, GPT-4, etc., the constraints of conventional DL in parameters and architecture have been transcended, enabling models to comprehensively explore relationships among all distinct positions within feature sequences. This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery, making it particularly advantageous for noise-robust image reconstruction in GI applications. In this paper, we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI (GILM). The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures, ensuring sufficient parametric capacity to capture intricate correlations among object single-pixel measurements. Moreover, GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction, facilitating the extraction of comprehensive object information for subsequent reconstruction. We validated the effectiveness of GILM through a series of experiments, including simulated object imaging, imaging objects in free space, and imaging object located 52 meters away in underwater environment. The experimental results show that GILM effectively analyzes the fluctuation trends of the collected signals, thereby optimizing the recovery of the object's image from the acquired data.
