Table of Contents
Fetching ...

AiDE-Q: Synthetic Labeled Datasets Can Enhance Learning Models for Quantum Property Estimation

Xinbiao Wang, Yuxuan Du, Zihan Lou, Yang Qian, Kaining Zhang, Yong Luo, Bo Du, Dacheng Tao

TL;DR

AiDE-Q addresses the practicality gap in DL-based quantum property estimation by iteratively generating high-quality synthetic labels from a hybrid dataset with limited measurements. It uses a consistency-check to filter synthetic labels and updates the DL model across iterations, achieving up to $14.2\%$ improvement on ground-state properties in Heisenberg XXZ, cluster-Ising, and molecular H$_4$ systems up to $50$ qubits. The framework is compatible with supervised, semi-supervised, and self-supervised paradigms and demonstrates that a basic SL model with AiDE-Q can outperform more complex baselines. This work suggests synthetic data, when quality-filtered, can meaningfully extend DL utility for quantum property estimation when hardware resources are scarce.

Abstract

Quantum many-body problems are central to various scientific disciplines, yet their ground-state properties are intrinsically challenging to estimate. Recent advances in deep learning (DL) offer potential solutions in this field, complementing prior purely classical and quantum approaches. However, existing DL-based models typically assume access to a large-scale and noiseless labeled dataset collected by infinite sampling. This idealization raises fundamental concerns about their practical utility, especially given the limited availability of quantum hardware in the near term. To unleash the power of these DL-based models, we propose AiDE-Q (\underline{a}utomat\underline{i}c \underline{d}ata \underline{e}ngine for \underline{q}uantum property estimation), an effective framework that addresses this challenge by iteratively generating high-quality synthetic labeled datasets. Specifically, AiDE-Q utilizes a consistency-check method to assess the quality of synthetic labels and continuously improves the employed DL models with the identified high-quality synthetic dataset. To verify the effectiveness of AiDE-Q, we conduct extensive numerical simulations on a diverse set of quantum many-body and molecular systems, with up to 50 qubits. The results show that AiDE-Q enhances prediction performance for various reference learning models, with improvements of up to $14.2\%$. Moreover, we exhibit that a basic supervised learning model integrated with AiDE-Q outperforms advanced reference models, highlighting the importance of a synthetic dataset. Our work paves the way for more efficient and practical applications of DL for quantum property estimation.

AiDE-Q: Synthetic Labeled Datasets Can Enhance Learning Models for Quantum Property Estimation

TL;DR

AiDE-Q addresses the practicality gap in DL-based quantum property estimation by iteratively generating high-quality synthetic labels from a hybrid dataset with limited measurements. It uses a consistency-check to filter synthetic labels and updates the DL model across iterations, achieving up to improvement on ground-state properties in Heisenberg XXZ, cluster-Ising, and molecular H systems up to qubits. The framework is compatible with supervised, semi-supervised, and self-supervised paradigms and demonstrates that a basic SL model with AiDE-Q can outperform more complex baselines. This work suggests synthetic data, when quality-filtered, can meaningfully extend DL utility for quantum property estimation when hardware resources are scarce.

Abstract

Quantum many-body problems are central to various scientific disciplines, yet their ground-state properties are intrinsically challenging to estimate. Recent advances in deep learning (DL) offer potential solutions in this field, complementing prior purely classical and quantum approaches. However, existing DL-based models typically assume access to a large-scale and noiseless labeled dataset collected by infinite sampling. This idealization raises fundamental concerns about their practical utility, especially given the limited availability of quantum hardware in the near term. To unleash the power of these DL-based models, we propose AiDE-Q (\underline{a}utomat\underline{i}c \underline{d}ata \underline{e}ngine for \underline{q}uantum property estimation), an effective framework that addresses this challenge by iteratively generating high-quality synthetic labeled datasets. Specifically, AiDE-Q utilizes a consistency-check method to assess the quality of synthetic labels and continuously improves the employed DL models with the identified high-quality synthetic dataset. To verify the effectiveness of AiDE-Q, we conduct extensive numerical simulations on a diverse set of quantum many-body and molecular systems, with up to 50 qubits. The results show that AiDE-Q enhances prediction performance for various reference learning models, with improvements of up to . Moreover, we exhibit that a basic supervised learning model integrated with AiDE-Q outperforms advanced reference models, highlighting the importance of a synthetic dataset. Our work paves the way for more efficient and practical applications of DL for quantum property estimation.

Paper Structure

This paper contains 21 sections, 25 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Coefficient of determination $\mathrm{R}^2$ of DL models for predicting entanglement entropy of $50$-qubit Heisenberg models. The prediction performance decreases as the ratio of noisy labels in training datasets increases or the number of measurements decreases.
  • Figure 2: Framewrok of the AiDE-Q. AiDE-Q follows an iterative pipeline consisting of three primary stages: (a) data labeling and collection: this stage first use the trained model $f_{\bm{\theta}(t)}$ at the $t$-iteration to generate labels for the data in $\mathcal{S}_{\mathop{\mathrm{hyd}}\nolimits}\backslash \mathcal{S}_{h}(t)$, and then using the consistency-check to collect the data $(p,\bm{M},\bm{o})$ and its synthetic label $\hat{\bm{y}}$ with small variance among the $s$ generated labels $\hat{\bm{y}}_{\mathcal{I}_k}$ of the masked data $(\bm{M}_{\mathcal{I}_k},\bm{o}_{\mathcal{I}_k})$, as defined in Eq. \ref{['eq:variance']}; (b) model training: this stage further fine-tunes the DL model with the updated dataset $\mathcal{S}_{h}(t+1)$ and obtain a new DL model $f_{\bm{\theta}(t+1)}$; (c) model evaluation: the updated DL model $f_{\bm{\theta}(t+1)}$ is evaluated on a validation dataset $\mathcal{S}_{\mathop{\mathrm{val}}\nolimits}$ to examine whether the prediction performance is improved compared to $f_{\bm{\theta}(t)}$.
  • Figure 3: $\mathrm{R}^2$ of reference models with and without integrating AiDE-Q in predicting entanglement entropy $S_A$, two-point correlations $\mathcal{C}_{1j}^x$ and $\mathcal{C}_{1j}^z$ of $10$-qubit XXZ model, where $A=[j]$ and $j\in[N-1]$. The initial ratio of high-quality data and the number of measurements for low-quality data is set as $r=0.4$ and $m_u=2^6$.
  • Figure 4: $\mathrm{R}^2$ in entanglement entropy prediction for the Heisenberg XXZ model. (a) $\mathrm{R}^2$ values of AiDE-Q-integrated SL models across varying quantum system sizes $N$ with different total training dataset sizes and fixed $m_u=2^6$. (b) Evolution of $\mathrm{R}^2$ across AiDE-Q's iterations for $50$-qubit XXZ model and fixed $m_u=2^6$. (c) $\mathrm{R}^2$ values with a varying number of measurements $m_u$ for low-quality data points in $50$-qubit XXZ model.
  • Figure 5: $\mathrm{R}^2$ in entanglement entropy prediction for the $50$-qubit Heisenberg XXZ model without using the physical parameters for constructing the training dataset. The panels from left to right correspond to the prediction performance for the number of measurements $m_u\in\{2^6,2^7,2^8\}$.
  • ...and 1 more figures