Table of Contents
Fetching ...

PtychoFormer: A Transformer-based Model for Ptychographic Phase Retrieval

Ryuma Nakahata, Shehtab Zaman, Mingyuan Zhang, Fake Lu, Kenneth Chiu

TL;DR

This work presents PtychoFormer, a hierarchical transformer-based model for data-driven single-shot ptychographic phase retrieval, which exhibits tolerance to sparsely scanned diffraction patterns and achieves up to 3600 times faster imaging speed than the extended ptychographic iterative engine (ePIE).

Abstract

Ptychography is a computational method of microscopy that recovers high-resolution transmission images of samples from a series of diffraction patterns. While conventional phase retrieval algorithms can iteratively recover the images, they require oversampled diffraction patterns, incur significant computational costs, and struggle to recover the absolute phase of the sample's transmission function. Deep learning algorithms for ptychography are a promising approach to resolving the limitations of iterative algorithms. We present PtychoFormer, a hierarchical transformer-based model for data-driven single-shot ptychographic phase retrieval. PtychoFormer processes subsets of diffraction patterns, generating local inferences that are seamlessly stitched together to produce a high-quality reconstruction. Our model exhibits tolerance to sparsely scanned diffraction patterns and achieves up to 3600 times faster imaging speed than the extended ptychographic iterative engine (ePIE). We also propose the extended-PtychoFormer (ePF), a hybrid approach that combines the benefits of PtychoFormer with the ePIE. ePF minimizes global phase shifts and significantly enhances reconstruction quality, achieving state-of-the-art phase retrieval in ptychography.

PtychoFormer: A Transformer-based Model for Ptychographic Phase Retrieval

TL;DR

This work presents PtychoFormer, a hierarchical transformer-based model for data-driven single-shot ptychographic phase retrieval, which exhibits tolerance to sparsely scanned diffraction patterns and achieves up to 3600 times faster imaging speed than the extended ptychographic iterative engine (ePIE).

Abstract

Ptychography is a computational method of microscopy that recovers high-resolution transmission images of samples from a series of diffraction patterns. While conventional phase retrieval algorithms can iteratively recover the images, they require oversampled diffraction patterns, incur significant computational costs, and struggle to recover the absolute phase of the sample's transmission function. Deep learning algorithms for ptychography are a promising approach to resolving the limitations of iterative algorithms. We present PtychoFormer, a hierarchical transformer-based model for data-driven single-shot ptychographic phase retrieval. PtychoFormer processes subsets of diffraction patterns, generating local inferences that are seamlessly stitched together to produce a high-quality reconstruction. Our model exhibits tolerance to sparsely scanned diffraction patterns and achieves up to 3600 times faster imaging speed than the extended ptychographic iterative engine (ePIE). We also propose the extended-PtychoFormer (ePF), a hybrid approach that combines the benefits of PtychoFormer with the ePIE. ePF minimizes global phase shifts and significantly enhances reconstruction quality, achieving state-of-the-art phase retrieval in ptychography.

Paper Structure

This paper contains 16 sections, 4 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Comparison of the ground truth against the phase reconstructions from our proposed methods, PtychoFormer and extended-PtychoFormer (ePF), and ePIE. Sum squared error (SSE), the objective function for ePIE, cannot distinguish between globally shifted phase values, showing comparable SSE values between ePF and ePIE despite distinct line profiles. ePIE reconstruction is affected by a substantial global phase shift, while ePF achieves better estimation by leveraging PtychoFormer for initialization.
  • Figure 2: Comprehensive overview of the simulation using PtychoFormer and extended-PtychoFormer (ePF). (a) depicts the transmission function $T(x,y)$ characterized by its amplitude $A(x,y)$ and phase $\phi(x,y)$. The light probe $P(x,y)$ propagates through the sample to produce the diffraction pattern $I(u,v)$ in the far field. $I(u,v)$ are then grouped into sets of nine and placed in separate channels. (b) illustrates how PtychoFormer processes the sets in parallel to reconstruct local patches of $T(x,y)$ and then stitches them to complete the reconstruction. ePF framework builds on this approach by introducing an additional step (c), where the initial estimate from PtychoFormer is fed into ePIE for iterative refinement, further improving the reconstruction accuracy of $T(x,y)$.
  • Figure 3: Input scheme depicted in (a) groups the diffraction patterns into subsets of nine and are placed in separate channels. This way, the spatial relation between each pattern is preserved. (b) showcases the stitching process, where Local predictions are cropped and feathered at the edges. (c) compares the reconstructions with and without feathering, whereby feathering effectively eliminates the grid artifacts present in the reconstruction without feathering.
  • Figure 4: PtychoFormer leverages a Mix Transformer (MiT) encoder and a convolutional decoder. The encoder includes four MiT stages that progressively reduce spatial resolution while increasing feature channels. The decoder upsamples the encoder outputs, adjusts feature channels, and refines the resolution to produce the amplitude and phase estimates.
  • Figure 5: A subset of probe functions we used are shown in (a). Probe A is the primary probe PtychoFormer is trained on and probe B is presented in the finetuning dataset, while probe C is reserved for testing. The grid formation in (b) is used for pre-training, and the other scan configurations are used for finetuning.
  • ...and 7 more figures