Table of Contents
Fetching ...

Benchmarking machine learning models for multi-class state recognition in double quantum dot data

Valeria Díaz Moreno, Ryan P Khalili, Daniel Schug, Patrick J. Walsh, Justyna P. Zwolak

TL;DR

The paper benchmarks four ML architectures (CNN, U-Net, ViT, MDN) for multi-class state recognition in double quantum dot charge-stability diagrams using the QFlow 2.0 dataset under varying data budgets and normalization schemes. It demonstrates that high-capacity models (U-Net, ViT) achieve top performance on synthetic data but struggle to generalize to experimental data, while CNNs strike the best practical balance with moderate compute and strong experimental performance; MDNs are computation-friendly but sacrifice peak accuracy. Normalization critically shapes training stability and accuracy: min–max often yields higher accuracy but more variability, whereas z-score offers stable training with somewhat reduced accuracy. The study provides actionable guidance for autotuning pipelines in quantum dot devices and releases a reproducible benchmarking framework to standardize future comparisons.

Abstract

Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstrapping, calibration, and operation, with many tuning aspects depending on accurately identifying QD device states from charge-stability diagrams (CSDs). In this work, we present a comprehensive benchmarking study of four modern machine learning (ML) architectures for multi-class state recognition in double-QD CSDs. We evaluate their performance across different data budgets and normalization schemes using both synthetic and experimental data. We find that the more resource-intensive models -- U-Nets and visual transformers (ViTs) -- achieve the highest MSE score (defined as $1-\mathrm{MSE}$) on synthetic data (over $0.98$) but fail to generalize to experimental data. MDNs are the most computationally efficient and exhibit highly stable training, but with substantially lower peak performance. CNNs offer the most favorable trade-off on experimental CSDs, achieving strong accuracy with two orders of magnitude fewer parameters than the U-Nets and ViTs. Normalization plays a nontrivial role: min-max scaling generally yields higher MSE scores but less stable convergence, whereas z-score normalization produces more predictable training dynamics but at reduced accuracy for most models. Overall, our study shows that CNNs with min-max normalization are a practical approach for QD CSDs.

Benchmarking machine learning models for multi-class state recognition in double quantum dot data

TL;DR

The paper benchmarks four ML architectures (CNN, U-Net, ViT, MDN) for multi-class state recognition in double quantum dot charge-stability diagrams using the QFlow 2.0 dataset under varying data budgets and normalization schemes. It demonstrates that high-capacity models (U-Net, ViT) achieve top performance on synthetic data but struggle to generalize to experimental data, while CNNs strike the best practical balance with moderate compute and strong experimental performance; MDNs are computation-friendly but sacrifice peak accuracy. Normalization critically shapes training stability and accuracy: min–max often yields higher accuracy but more variability, whereas z-score offers stable training with somewhat reduced accuracy. The study provides actionable guidance for autotuning pipelines in quantum dot devices and releases a reproducible benchmarking framework to standardize future comparisons.

Abstract

Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstrapping, calibration, and operation, with many tuning aspects depending on accurately identifying QD device states from charge-stability diagrams (CSDs). In this work, we present a comprehensive benchmarking study of four modern machine learning (ML) architectures for multi-class state recognition in double-QD CSDs. We evaluate their performance across different data budgets and normalization schemes using both synthetic and experimental data. We find that the more resource-intensive models -- U-Nets and visual transformers (ViTs) -- achieve the highest MSE score (defined as ) on synthetic data (over ) but fail to generalize to experimental data. MDNs are the most computationally efficient and exhibit highly stable training, but with substantially lower peak performance. CNNs offer the most favorable trade-off on experimental CSDs, achieving strong accuracy with two orders of magnitude fewer parameters than the U-Nets and ViTs. Normalization plays a nontrivial role: min-max scaling generally yields higher MSE scores but less stable convergence, whereas z-score normalization produces more predictable training dynamics but at reduced accuracy for most models. Overall, our study shows that CNNs with min-max normalization are a practical approach for QD CSDs.

Paper Structure

This paper contains 12 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a-b) Two examples of large and (c) small experimentally acquired charge stability diagrams included in the QFlow 2.0: Quantum dot data for machine learning dataset qf-data. The manually assigned labels for the small charge stability diagrams are ${\rm\bf{p}}(V_{{\mathcal{R}}_1})=(0, 0, 0.1, 0, 0.9)$ for (c-i) and ${\rm\bf{p}}(V_{{\mathcal{R}}_2})=(0, 0, 1, 0, 0)$ for (c-ii).
  • Figure 2: (a) A sample synthetic charge stability diagram and (b) the corresponding state map from the QFlow 2.0: Quantum dot data for machine learning dataset qf-data. The state map labels correspond to the five possible states for a double-QD device: no-dot (ND), left (SD$_L$), central (SD$_C$), and right (SD$_R$) single-dot, and double-dot (DD). (c-i) An example patch $V_{\mathcal{R}}$ sampled from the CSD, highlighted in panel (a) with a white rectangle. (c-ii) A State label patch corresponding to the example patch shown in panel (c-i), highlighted in panel (b) with a white rectangle. The assigned state label vector for this patch is ${\rm\bf{p}}(V_{\mathcal{R}})=(0, 0.3, 0, 0, 0.7)$.
  • Figure 3: Box plots illustrating the distribution of the number of epochs for each class of models trained on data normalized using (a) min-max and (b) z-score. The colors correspond to the four data budgets: $25~\%$ (green), $50~\%$ (blue), $75~\%$ (yellow), and $100~\%$ (gray), respectively.
  • Figure 4: MSE score for (a-b) simulated and (c-d) experimental test datasets across all models. Models trained using min-max normalization are shown in panels (a) and (c), while panels (b) and (d) show models trained using z-score normalization. The colors correspond to the four data budgets: $25~\%$ (green), $50~\%$ (blue), $75~\%$ (yellow), and $100~\%$ (gray), respectively.