Table of Contents
Fetching ...

HoloPASWIN: Robust Inline Holographic Reconstruction via Physics-Aware Swin Transformers

Gökhan Koçmarlı, G. Bora Esmer

Abstract

In-line digital holography (DIH) is a widely used lensless imaging technique, valued for its simplicity and capability to image samples at high throughput. However, capturing only intensity of the interference pattern during the recording process gives rise to some unwanted terms such as cross-term and twin-image. The cross-term can be suppressed by adjusting the intensity of reference wave, but the twin-image problem remains. The twin-image is a spectral artifact that superimposes a defocused conjugate wave onto the reconstructed object, severely degrading image quality. While deep learning has recently emerged as a powerful tool for phase retrieval, traditional Convolutional Neural Networks (CNNs) are limited by their local receptive fields, making them less effective at capturing the global diffraction patterns inherent in holography. In this study, we introduce HoloPASWIN, a physics-aware deep learning framework based on the Swin Transformer architecture. By leveraging hierarchical shifted-window attention, our model efficiently captures both local details and long-range dependencies essential for accurate holographic reconstruction. We propose a comprehensive loss function that integrates frequency-domain constraints with physical consistency via a differentiable angular spectrum propagator, ensuring high spectral fidelity. Validated on a large-scale synthetic dataset of 25,000 samples with diverse noise configurations (speckle, shot, read, and dark noise), HoloPASWIN demonstrates effective twin-image suppression and robust reconstruction quality.

HoloPASWIN: Robust Inline Holographic Reconstruction via Physics-Aware Swin Transformers

Abstract

In-line digital holography (DIH) is a widely used lensless imaging technique, valued for its simplicity and capability to image samples at high throughput. However, capturing only intensity of the interference pattern during the recording process gives rise to some unwanted terms such as cross-term and twin-image. The cross-term can be suppressed by adjusting the intensity of reference wave, but the twin-image problem remains. The twin-image is a spectral artifact that superimposes a defocused conjugate wave onto the reconstructed object, severely degrading image quality. While deep learning has recently emerged as a powerful tool for phase retrieval, traditional Convolutional Neural Networks (CNNs) are limited by their local receptive fields, making them less effective at capturing the global diffraction patterns inherent in holography. In this study, we introduce HoloPASWIN, a physics-aware deep learning framework based on the Swin Transformer architecture. By leveraging hierarchical shifted-window attention, our model efficiently captures both local details and long-range dependencies essential for accurate holographic reconstruction. We propose a comprehensive loss function that integrates frequency-domain constraints with physical consistency via a differentiable angular spectrum propagator, ensuring high spectral fidelity. Validated on a large-scale synthetic dataset of 25,000 samples with diverse noise configurations (speckle, shot, read, and dark noise), HoloPASWIN demonstrates effective twin-image suppression and robust reconstruction quality.
Paper Structure (18 sections, 13 equations, 5 figures, 4 tables)

This paper contains 18 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: HoloPASWIN Network Architecture. The input hologram is first processed by the ASM physics module, then decomposed into real and imaginary parts. A Swin Transformer U-Net with hierarchical feature extraction and shifted-window attention predicts a complex residual correction to reconstruct the clean complex field. Skip connections between encoder and decoder stages preserve multi-scale features for high-frequency detail recovery.
  • Figure 2: Qualitative comparison of reconstruction results. Rows correspond to different test samples. From left to right: Input Hologram, GT Amplitude, Predicted Amplitude, GT Phase, Predicted Phase. The model effectively removes twin-images and background noise while preserving object sharpness.
  • Figure 3: Detailed reconstruction analysis showing Ground Truth (GT), Prediction, Error maps, and zoomed-in regions for both amplitude and phase. The zoomed regions highlight the model's ability to recover fine structural details of the objects while suppressing reconstruction noise. The error maps demonstrate high fidelity with minimal residuals at the object boundaries, even in the presence of synthetic experimental noise.
  • Figure 4: Visual comparison of architecture configurations on a representative test sample. Both amplitude and phase images are independently normalized (1st-99th percentile) to enhance visual contrast. The B/S (Background-to-Signal) ratio quantifies background cleanliness. Note that the ground truth (GT) has B/S=0.968, reflecting the inherent background signal in the simulated object field. The trained models achieve lower B/S ratios (0.79-0.87), indicating they have learned to suppress background noise beyond what is present in the GT, likely through learned denoising and twin-image artifact removal.
  • Figure 5: Model sensitivity to propagation distance errors. Performance is optimal at the training distance (z=20 mm) and degrades sharply with $\pm 0.5$ mm or larger offsets, indicating the importance of accurate distance calibration.