Table of Contents
Fetching ...

Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation

Jilles S. van Hulst, W. P. M. H., Heemels, Duarte J. Antunes

Abstract

Electron microscopy has enabled many scientific breakthroughs across multiple fields. A key challenge is the tuning of microscope parameters based on images to overcome optical aberrations that deteriorate image quality. This calibration problem is challenging due to the high-dimensional and noisy nature of the diagnostic images, and the fact that optimal parameters cannot be identified from a single image. We tackle the calibration problem for Scanning Transmission Electron Microscopes (STEM) by employing variational autoencoders (VAEs), trained on simulated data, to learn low-dimensional representations of images, whereas most existing methods extract only scalar values. We then simultaneously estimate the model that maps calibration parameters to encoded representations and the optimal calibration parameters using an expectation maximization (EM) approach. This joint estimation explicitly addresses the simulation-to-reality gap inherent in data-driven methods that train on simulated data from a digital twin. We leverage the known symmetry property of the optical system to establish global identifiability of the joint estimation problem, ensuring that a unique optimum exists. We demonstrate that our approach is substantially faster and more consistent than existing methods on a real STEM, achieving a 2x reduction in estimation error while requiring fewer observations. This represents a notable advance in automated STEM calibration and demonstrates the potential of VAEs for information compression in images. Beyond microscopy, the VAE-EM framework applies to inverse problems where simulated training data introduces a reality gap and where non-injective mappings would otherwise prevent unique solutions.

Bridging the Simulation-to-Reality Gap in Electron Microscope Calibration via VAE-EM Estimation

Abstract

Electron microscopy has enabled many scientific breakthroughs across multiple fields. A key challenge is the tuning of microscope parameters based on images to overcome optical aberrations that deteriorate image quality. This calibration problem is challenging due to the high-dimensional and noisy nature of the diagnostic images, and the fact that optimal parameters cannot be identified from a single image. We tackle the calibration problem for Scanning Transmission Electron Microscopes (STEM) by employing variational autoencoders (VAEs), trained on simulated data, to learn low-dimensional representations of images, whereas most existing methods extract only scalar values. We then simultaneously estimate the model that maps calibration parameters to encoded representations and the optimal calibration parameters using an expectation maximization (EM) approach. This joint estimation explicitly addresses the simulation-to-reality gap inherent in data-driven methods that train on simulated data from a digital twin. We leverage the known symmetry property of the optical system to establish global identifiability of the joint estimation problem, ensuring that a unique optimum exists. We demonstrate that our approach is substantially faster and more consistent than existing methods on a real STEM, achieving a 2x reduction in estimation error while requiring fewer observations. This represents a notable advance in automated STEM calibration and demonstrates the potential of VAEs for information compression in images. Beyond microscopy, the VAE-EM framework applies to inverse problems where simulated training data introduces a reality gap and where non-injective mappings would otherwise prevent unique solutions.
Paper Structure (25 sections, 2 theorems, 26 equations, 8 figures, 1 algorithm)

This paper contains 25 sections, 2 theorems, 26 equations, 8 figures, 1 algorithm.

Key Result

Lemma 1

Let $\Phi: \mathbb{R}^n \to \mathbb{R}^m$ be a vector of basis functions, and $(x_0^*, \theta^*)$ be a solution to eq:joint_estimation. If holds for some nonzero shift $\delta \in \mathbb{R}^n$, $\delta \neq 0$, and vector $\bar{\theta} \in \mathbb{R}^m$, then the pair $(x_0^* + \delta, \bar{\theta})$ is also a solution, making the optimization problem eq:joint_estimation ill-posed.

Figures (8)

  • Figure 1: Grid of Ronchigram images (left) and their Fourier power spectra (right) for different aberration values. The horizontal axis varies defocus ($C1$) from $-200$ to $+200$ nm, while the vertical axis varies two-fold astigmatism ($A1x$) over the same range. The center images correspond to well-tuned aberrations ($C1=0$, $A1x=0$). Colored rectangles group images in specific aberration regions. Note that the Fourier power spectra exhibit symmetry about the origin, with images at $(x_1, x_2)$ and $(-x_1, -x_2)$ appearing similar (e.g., purple and orange, cyan and red).
  • Figure 2: Calibration pipeline overview. High-dimensional Ronchigram images are preprocessed and then encoded to a low-dimensional latent space using a VAE. The EM algorithm then iteratively refines both the aberration state estimate and the mapping from aberrations to latent representations. In the E-step, the distribution over aberration states is estimated by evaluating the likelihood of observed latent space values for different candidate states. These candidates ($\bullet$$\bullet$$\bullet$) are highlighted in the aberration space, where the color indicates their estimated likelihood (low $\bullet$ to high $\bullet$). The $\boldsymbol{\times}$ indicates the ground truth aberration state. Note that this state and its symmetrical neighbor have an increased density of candidates due to the adaptive candidate refinement strategy. The most likely candidates are also clustered around the ground truth state. The M-step then updates the Gaussian process model of the aberration to latent space mapping. The current model belief based on 5 data points ($\circ$) is illustrated as a function of the aberrations. Note that this model belief is symmetric due to the symmetry constraint in the model structure. The symmetry point is not necessarily at the origin (in the true aberration coordinates) due to the fact that the state estimate is not yet perfect. The M- and E-steps are iterated until convergence.
  • Figure 3: Variational Autoencoder architecture. The encoder maps preprocessed Ronchigram power spectra to a low-dimensional latent distribution, from which samples are decoded to reconstruct the input. The weights and biases of the encoder and decoder networks are optimized to maximize the ELBO objective in \ref{['eq:elbo']}, which balances reconstruction accuracy and latent space regularization. The example reconstruction to the right of the decoder resembles the input and shows the denoising effect of the VAE.
  • Figure 4: Learned VAE latent space for simulated data that contains two varying aberration parameters: defocus ($C1$), and two-fold astigmatism in the x-direction ($A1x$). The color of the data points is determined by the aberration region, with the assignment shown in the top right of the figure and in Fig. \ref{['fig:images_grid']}. We can observe that similar aberrations are grouped together. Additionally, different aberration values that produce similar Ronchigrams are still mapped to the same latent space values (example: purple and orange), demonstrating the non-injectivity of the observation model.
  • Figure 5: Latent space distribution for Ronchigram datasets gathered on real STEM on different days. Left: dataset 1, middle: dataset 2, right: dataset 3. The latent space data points are colored according to their aberration values, with the color-coding scheme taken from Fig. \ref{['fig:images_grid']}. We can observe consistency across datasets, as well as with respect to the simulated data from Fig. \ref{['fig:vae_latent_space_sim']}, indicating that the VAE has learned a meaningful latent representation that reflects the underlying aberration states.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 2: Global Identifiability of $(x_0, \theta)$
  • Remark 3