Table of Contents
Fetching ...

Quality-Controlled Active Learning via Gaussian Processes for Robust Structure-Property Learning in Autonomous Microscopy

Jawad Chowdhury, Ganesh Narasimha, Jan-Chi Yang, Yongtao Liu, Rama Vasudevan

Abstract

Autonomous experimental systems are increasingly used in materials research to accelerate scientific discovery, but their performance is often limited by low-quality, noisy data. This issue is especially problematic in data-intensive structure-property learning tasks such as Image-to-Spectrum (Im2Spec) and Spectrum-to-Image (Spec2Im) translations, where standard active learning strategies can mistakenly prioritize poor-quality measurements. We introduce a gated active learning framework that combines curiosity-driven sampling with a physics-informed quality control filter based on the Simple Harmonic Oscillator model fits, allowing the system to automatically exclude low-fidelity data during acquisition. Evaluations on a pre-acquired dataset of band-excitation piezoresponse spectroscopy (BEPS) data from PbTiO3 thin films with spatially localized noise show that the proposed method outperforms random sampling, standard active learning, and multitask learning strategies. The gated approach enhances both Im2Spec and Spec2Im by handling noise during training and acquisition, leading to more reliable forward and inverse predictions. In contrast, standard active learners often misinterpret noise as uncertainty and end up acquiring bad samples that hurt performance. Given its promising applicability, we further deployed the framework in real-time experiments on BiFeO3 thin films, demonstrating its effectiveness in real autonomous microscopy experiments. Overall, this work supports a shift toward hybrid autonomy in self-driving labs, where physics-informed quality assessment and active decision-making work hand-in-hand for more reliable discovery.

Quality-Controlled Active Learning via Gaussian Processes for Robust Structure-Property Learning in Autonomous Microscopy

Abstract

Autonomous experimental systems are increasingly used in materials research to accelerate scientific discovery, but their performance is often limited by low-quality, noisy data. This issue is especially problematic in data-intensive structure-property learning tasks such as Image-to-Spectrum (Im2Spec) and Spectrum-to-Image (Spec2Im) translations, where standard active learning strategies can mistakenly prioritize poor-quality measurements. We introduce a gated active learning framework that combines curiosity-driven sampling with a physics-informed quality control filter based on the Simple Harmonic Oscillator model fits, allowing the system to automatically exclude low-fidelity data during acquisition. Evaluations on a pre-acquired dataset of band-excitation piezoresponse spectroscopy (BEPS) data from PbTiO3 thin films with spatially localized noise show that the proposed method outperforms random sampling, standard active learning, and multitask learning strategies. The gated approach enhances both Im2Spec and Spec2Im by handling noise during training and acquisition, leading to more reliable forward and inverse predictions. In contrast, standard active learners often misinterpret noise as uncertainty and end up acquiring bad samples that hurt performance. Given its promising applicability, we further deployed the framework in real-time experiments on BiFeO3 thin films, demonstrating its effectiveness in real autonomous microscopy experiments. Overall, this work supports a shift toward hybrid autonomy in self-driving labs, where physics-informed quality assessment and active decision-making work hand-in-hand for more reliable discovery.

Paper Structure

This paper contains 13 sections, 5 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Overview of the proposed ActiveQC framework for the Im2Spec task. (a) Training of the base Im2Spec model: an input image patch is passed through an encoder-decoder network to generate the corresponding spectrum, from which the spectral prediction error is computed. (b) Training of the surrogate error model: using the latent representation produced by the trained encoder, a separate model is trained to predict the spectral error estimated in panel (a). (c) ActiveQC acquisition pipeline: a Gaussian Process (GP)-based quality filter predicts spectral fidelity across the spatial domain and removes low-quality candidates from the acquisition pool. For the remaining high-quality samples, an acquisition score is computed by combining predicted error, distance to the existing training set, and representativeness of each candidate. Top-ranked candidates are then selected for exploration in the next iteration.
  • Figure 2: Overview of the ActiveMT (multitask learner) baseline framework for the Im2Spec task. Input image patch is encoded into a latent representation, which is passed through two decoder branches. The upper branch predicts spectra and is trained using noisy spectral measurements. The lower branch reconstructs the original input image, providing a clean supervision signal to regularize the latent space. This multitask design mitigates the impact of noise and promotes more robust learning.
  • Figure 3: Initial distribution of spectral intensities across train, validation, and test subsets. Kernel density estimates confirm that all splits preserve the overall distribution, enabling fair and representative evaluation.
  • Figure 4: Examples of two representative samples from clean and noise-impacted regions. Subplots (a)--(d) show the patch location, extracted image patch, piezoresponse spectrum over spectral dimension, and piezoresponse spectrum over applied voltage, respectively, for a sample unaffected by induced noise. Subplots (e)--(h) present the similar visualizations for a sample from a noise-affected region. The noisy spectra exhibit severe deviations from the expected ferroelectric hysteresis loop shape, leading to very low SHO-fit $R^2$-scores and highlighting the importance of incorporating quality-aware filters during learning and acquisition. Scale bar indicates a length of 100 nm.
  • Figure 5: Quality estimation using Gaussian Process regression. (a) Ground-truth noise map applied to the BEPS spectra, where brighter colors indicate stronger corruption. (b) Mean SHO-fit $R^2$-scores computed for the initial set of sampled locations, based on fitting complex BEPS spectra across all DC-bias points. (c) GP-predicted spatial map of spectral quality (mean $R^2$) at the initial acquisition step. (d) GP-predicted quality map after the final acquisition step, showing improved delineation between high- and low-fidelity regions. Scale bar indicates a length of 100 nm.
  • ...and 9 more figures