Benchmarking machine learning models for multi-class state recognition in double quantum dot data
Valeria Díaz Moreno, Ryan P Khalili, Daniel Schug, Patrick J. Walsh, Justyna P. Zwolak
TL;DR
The paper benchmarks four ML architectures (CNN, U-Net, ViT, MDN) for multi-class state recognition in double quantum dot charge-stability diagrams using the QFlow 2.0 dataset under varying data budgets and normalization schemes. It demonstrates that high-capacity models (U-Net, ViT) achieve top performance on synthetic data but struggle to generalize to experimental data, while CNNs strike the best practical balance with moderate compute and strong experimental performance; MDNs are computation-friendly but sacrifice peak accuracy. Normalization critically shapes training stability and accuracy: min–max often yields higher accuracy but more variability, whereas z-score offers stable training with somewhat reduced accuracy. The study provides actionable guidance for autotuning pipelines in quantum dot devices and releases a reproducible benchmarking framework to standardize future comparisons.
Abstract
Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstrapping, calibration, and operation, with many tuning aspects depending on accurately identifying QD device states from charge-stability diagrams (CSDs). In this work, we present a comprehensive benchmarking study of four modern machine learning (ML) architectures for multi-class state recognition in double-QD CSDs. We evaluate their performance across different data budgets and normalization schemes using both synthetic and experimental data. We find that the more resource-intensive models -- U-Nets and visual transformers (ViTs) -- achieve the highest MSE score (defined as $1-\mathrm{MSE}$) on synthetic data (over $0.98$) but fail to generalize to experimental data. MDNs are the most computationally efficient and exhibit highly stable training, but with substantially lower peak performance. CNNs offer the most favorable trade-off on experimental CSDs, achieving strong accuracy with two orders of magnitude fewer parameters than the U-Nets and ViTs. Normalization plays a nontrivial role: min-max scaling generally yields higher MSE scores but less stable convergence, whereas z-score normalization produces more predictable training dynamics but at reduced accuracy for most models. Overall, our study shows that CNNs with min-max normalization are a practical approach for QD CSDs.
