Table of Contents
Fetching ...

Fourier-Based 3D Multistage Transformer for Aberration Correction in Multicellular Specimens

Thayer Alshaabi, Daniel E. Milkie, Gaoxiang Liu, Cyna Shirazinejad, Jason L. Hong, Kemal Achour, Frederik Görlitz, Ana Milunovic-Jevtic, Cat Simmons, Ibrahim S. Abuzahriyeh, Erin Hong, Samara Erin Williams, Nathanael Harrison, Evan Huang, Eun Seok Bae, Alison N. Killilea, David G. Drubin, Ian A. Swinburne, Srigokul Upadhyayula, Eric Betzig

TL;DR

High-resolution tissue imaging is limited by sample-induced aberrations that reduce resolution and contrast. We present AOViFT, Adaptive Optical Vision Fourier Transformer, a 3D multistage Vision Transformer that operates on Fourier-domain embeddings to infer wavefront distortions and restore diffraction-limited performance without guide stars or wavefront sensors. The method relies on synthetic Fourier-embedded training data and fiducial puncta (AP2) to map spatially varying aberrations, with experimental validation in beads, cultured cells, and live zebrafish embryos, including post-acquisition spatially varying deconvolution. This approach reduces hardware complexity, memory and training time, and enables rapid, noninvasive aberration correction with potential for scalable 4D foundation-models in volumetric microscopy.

Abstract

High-resolution tissue imaging is often compromised by sample-induced optical aberrations that degrade resolution and contrast. While wavefront sensor-based adaptive optics (AO) can measure these aberrations, such hardware solutions are typically complex, expensive to implement, and slow when serially mapping spatially varying aberrations across large fields of view. Here, we introduce AOViFT (Adaptive Optical Vision Fourier Transformer) -- a machine learning-based aberration sensing framework built around a 3D multistage Vision Transformer that operates on Fourier domain embeddings. AOViFT infers aberrations and restores diffraction-limited performance in puncta-labeled specimens with substantially reduced computational cost, training time, and memory footprint compared to conventional architectures or real-space networks. We validated AOViFT on live gene-edited zebrafish embryos, demonstrating its ability to correct spatially varying aberrations using either a deformable mirror or post-acquisition deconvolution. By eliminating the need for the guide star and wavefront sensing hardware and simplifying the experimental workflow, AOViFT lowers technical barriers for high-resolution volumetric microscopy across diverse biological samples.

Fourier-Based 3D Multistage Transformer for Aberration Correction in Multicellular Specimens

TL;DR

High-resolution tissue imaging is limited by sample-induced aberrations that reduce resolution and contrast. We present AOViFT, Adaptive Optical Vision Fourier Transformer, a 3D multistage Vision Transformer that operates on Fourier-domain embeddings to infer wavefront distortions and restore diffraction-limited performance without guide stars or wavefront sensors. The method relies on synthetic Fourier-embedded training data and fiducial puncta (AP2) to map spatially varying aberrations, with experimental validation in beads, cultured cells, and live zebrafish embryos, including post-acquisition spatially varying deconvolution. This approach reduces hardware complexity, memory and training time, and enables rapid, noninvasive aberration correction with potential for scalable 4D foundation-models in volumetric microscopy.

Abstract

High-resolution tissue imaging is often compromised by sample-induced optical aberrations that degrade resolution and contrast. While wavefront sensor-based adaptive optics (AO) can measure these aberrations, such hardware solutions are typically complex, expensive to implement, and slow when serially mapping spatially varying aberrations across large fields of view. Here, we introduce AOViFT (Adaptive Optical Vision Fourier Transformer) -- a machine learning-based aberration sensing framework built around a 3D multistage Vision Transformer that operates on Fourier domain embeddings. AOViFT infers aberrations and restores diffraction-limited performance in puncta-labeled specimens with substantially reduced computational cost, training time, and memory footprint compared to conventional architectures or real-space networks. We validated AOViFT on live gene-edited zebrafish embryos, demonstrating its ability to correct spatially varying aberrations using either a deformable mirror or post-acquisition deconvolution. By eliminating the need for the guide star and wavefront sensing hardware and simplifying the experimental workflow, AOViFT lowers technical barriers for high-resolution volumetric microscopy across diverse biological samples.

Paper Structure

This paper contains 52 sections, 21 equations, 32 figures, 8 tables.

Figures (32)

  • Figure 1: AOViFT workflow. A. AOViFT correction. An aberrated 3D volume is preprocessed and cast into a Fourier Embedding, which is passed to a 3D vision transformer model to predict the detection wavefront. A deformable mirror (DM) compensates for this aberration, enabling acquisition of a corrected volume. B. The Fourier embedding, $\mathcal{E}$. The Fourier Transform of the 3D volume is embedded into a lower space ($\mathcal{E} \in \mathbb{R}^{\ell \times d \times d}$), consisting of 3 amplitude planes ($\alpha_1,\alpha_2,\alpha_3$) and 3 phase planes ($\varphi_1,\varphi_2,\varphi_3$). C. AOViFT model. The Fourier embedding is input to a dual-stage 3D vision transformer model. At each stage, the $\ell$ Fourier planes are tiled into $k$ patches (Patchify), applying a radially encoded positional embedding to each patch. These patches are passed through $n$ Transformer layers. At the end of each stage, a residual connection is added, and the patches are merged back to the shape matching the stage input (Merge patches). After all stages, the resulting patches are pooled (GlobalAvgPool) and connected with a dense layer to output the $z$ Zernike coefficients.
  • Figure 2: Comparison of different state-of-the-art architectures when applied to 3D aberration sensing.A. Total number of trainable parameters. B. Maximum predictions per second, using a batch size of 1024 on a single A100 GPU. Higher values are better. C. Training time on eight H100 GPUs. D. Median $\lambda$ RMS residuals over 10K test samples after one correction, with aberrations ranging between $0.2\lambda$ to $0.4\lambda$, simulated with 50K to 200K integrated-photons. E--F. Median $\lambda$ RMS residuals using our Small model for a single bead over a wide range of SNR. G--H. Median $\lambda$ RMS residuals using our Small model for several beads (up to 150 beads), simulated at photon levels from 50K to 200K per bead. Lower values are better for all performance indicators listed here, except for B.
  • Figure 3: Experimental correction of beads with initial artificial aberrations.A--L. Four examples, O-Astig & H-Coma $(Z^{m=\text{-}2}_{n=2} + Z^{m=1}_{n=3})$, O-Quadrafoil & P-Spherical $(Z^{m=\text{-}4}_{n=4} + Z^{m=0}_{n=4})$, V-Astig & V-Trefoil $(Z^{m=2}_{n=2} + Z^{m=\text{-}3}_{n=3})$, V-Coma & O-Astig2 $(Z^{m=\text{-}1}_{n=3} + Z^{m=\text{-}2}_{n=4})$, where the initial aberration was artificially applied by the DM. Iteration 0 shows XY maximum projection of four beads with initial aberration imaged using LLS, providing initial conditions for AOViFT predictions. Iteration 1 shows the resulting field of beads after applying AOViFT prediction to the DM. Iteration 2 shows the results after applying the AOViFT prediction measured from Iteration 1. Insets show the AOViFT predicted wavefront over the $\text{NA}=1.0$ pupil with a dashed line at $NA=0.85$M. Heatmap of the residual aberrations (measured via PhaseRetrieval on isolated bead) after applying AOViFT predictions, starting with a single Zernike mode up to Mode 14 ($Z^{m=4}_{n=4}$) across up to 5 iterations.
  • Figure 4: Correction of aberrations in live SUM159-AP-2 cells expressing $\sigma$2-eGFP.A. 3D volume SUM159-AP2 cells represented as XY and YZ MIPs covering a $15.7 \times 55.6 \times 25.6$$\mu$m$^3$ FOV after applying a 2.9$\lambda$ peak-to-valley (P-V) aberration to the DM. This aberration combines horizontal coma $Z_3^1$ and oblique trefoil $Z_3^3$. B. XY and YZ MIPs of a similar FOV with 3.1$\lambda$ P-V aberration composed of horizontal coma ($Z_3^1$) and primary spherical ($Z_4^0$). In both cases, near diffraction-limited performance was recovered after two iterations. The insets show FFTs and corresponding wavefronts for each iteration. Scale bar, $5\mu$m
  • Figure 5: In vivo, in situ correction of native aberrations in zebrafish embryos.A. XY MIP of a 72 hpf gene-edited zebrafish embryo expressing endogenous AP2-mNeonGreen, exhibiting native and spatially varying aberrations near the notochord. Scale bar, $10\mu$m. B. Enlarged view of the dashed blue box in (A). The XY and YZ MIPs, along with the corresponding FFTs of a $12.5 \times 12.5 \times 12.8 \mu$m$^3$ FOV, show $\sim$2$\lambda$ P-V of sample-induced aberration without AO (top row), corrected by SH (middle row), and corrected by AOViFT (bottom row). The contrast for each volume was scaled to its 1st and 99.99th percentile intensity values. Scale bar, $2\mu$m. C. XY and YZ MIPs of a different gene-edited zebrafish embryo expressing exogenous AP2-mNeonGreen and injected mRNA for mChilada-Cox8a (to visualize mitochondria). The AP2 signal was used to infer the underlying aberration, and the same correction was applied to both channels. The top row shows $\sim$1.5$\lambda$ P-V aberration; the second row shows AOViFT correction after two iterations. The third and fourth rows present the results of OTF masked Wiener (OMW) deconvolution without and with AOViFT corrected volumes, respectively. Scale bar, $2\mu$m.
  • ...and 27 more figures