PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Weijie Gan; Qiuchen Zhai; Michael Thompson McCann; Cristina Garcia Cardona; Ulugbek S. Kamilov; Brendt Wohlberg

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Weijie Gan, Qiuchen Zhai, Michael Thompson McCann, Cristina Garcia Cardona, Ulugbek S. Kamilov, Brendt Wohlberg

TL;DR

PtychDV addresses the high computational cost of nonlinear ptychographic phase retrieval by integrating a vision transformer that jointly considers overlapping measurements to produce an informative initial image, with a deep unrolling network that enforces the forward ptychography model and learned priors. The method combines a measurement-aware ViT initialization with a Wirtinger-flow–based DU refinement, trained end-to-end using a dual loss that optimizes both image-level accuracy and patch-level consistency. Empirical results on simulated data show PtychoDV outperforms existing DL baselines and rivals iterative methods, while significantly reducing computation time, especially in sparse-sampling scenarios, and can provide beneficial initializations to accelerate PMACE even with unseen probes. This approach holds promise for real-time reconstruction and improved initialization of traditional iterative schemes, with future work extending to real data and self-supervised learning.

Abstract

Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance.

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

TL;DR

Abstract

Paper Structure (19 sections, 14 equations, 7 figures, 7 tables)

This paper contains 19 sections, 14 equations, 7 figures, 7 tables.

Introduction
Related Work
Problem Formulation
Iterative Methods
Deep Learning Approaches
Proposed Method: PtychoDV
Vision Transformer
Deep Unrolling Network
Loss Function
Numerical Validation
Experimental Setup
Dataset
Implementation
Evaluation
Comparison
...and 4 more sections

Figures (7)

Figure 1: An illustration of the pipeline of PtychoDV that consists of two main components: (a) a vision transformer module that reconstructs an initial image from raw measurements by taking into account the interdependencies of the measurements, and (b) a DU network that refines the initial image using the measurement forwards and CNN priors. See \ref{['equ:WF']} for the iterative update of the physical consistency module.
Figure 2: Illustrations of magnitude of ground truth image, two simulated ground truth probes (images in the top row are magnitude, bottom row phase), and sampling pattern of $256$:$5$. Probe A was used to synthesize measurement for training and testing, while probe B was exclusively for testing the pre-trained models.
Figure 3: Visual results of PtychoDV and other baseline methods on noisy testing data with sampling pattern of $64$:$11$. The magnitude and the phase of the reconstructed images are shown in the top and the bottom row, respectively. NRMSE values are included in the right bottom of each image. This figure highlights superior performance of PtychoDV on sparse sampling pattern. Note that PtychoDV can reconstruct images that are consistent with ground truth, whereas the results from the other baseline exhibit noise and blurry artifacts.
Figure 4: Visual results of PtychoDV and its variants on noisy testing data with sampling pattern of $64$:$11$. The magnitude and the phase of the reconstructed images are shown in the top and the bottom row, respectively. NRMSE values of each method is labeled in the right bottom of each image. This figure shows that PtychoDV can gain superior performance over its ablated methods.
Figure 5: Visual results of PMACE tested on noise-free data generated using different probe and different initialization. The magnitude and the phase of the reconstructed images are shown in the top and the bottom row, respectively. NRMSE values of each method is labeled in the right bottom of each image. This figure shows that PMACE with a small number of iterations can achieve better performance by using PtychoDV initialization than that without it. This figure also highlights that PtychoDV could also be used to compute initialization even when the testing probe is different from the probe used in training.
...and 2 more figures

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

TL;DR

Abstract

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (7)