PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction
Weijie Gan, Qiuchen Zhai, Michael Thompson McCann, Cristina Garcia Cardona, Ulugbek S. Kamilov, Brendt Wohlberg
TL;DR
PtychDV addresses the high computational cost of nonlinear ptychographic phase retrieval by integrating a vision transformer that jointly considers overlapping measurements to produce an informative initial image, with a deep unrolling network that enforces the forward ptychography model and learned priors. The method combines a measurement-aware ViT initialization with a Wirtinger-flow–based DU refinement, trained end-to-end using a dual loss that optimizes both image-level accuracy and patch-level consistency. Empirical results on simulated data show PtychoDV outperforms existing DL baselines and rivals iterative methods, while significantly reducing computation time, especially in sparse-sampling scenarios, and can provide beneficial initializations to accelerate PMACE even with unseen probes. This approach holds promise for real-time reconstruction and improved initialization of traditional iterative schemes, with future work extending to real data and self-supervised learning.
Abstract
Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance.
