Table of Contents
Fetching ...

Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver

Junya Nakamura, Shinichiro Sanji

TL;DR

The paper tackles learning efficiency in generative quantum eigensolvers by introducing Persistent-DPO (P-DPO), which preserves learning pressure across samples to avoid vanishing updates, and a hybrid offline-online training scheme that reuses past low-energy samples to accelerate convergence. It also proposes a loss-masking method to impose upper bounds on the occurrence of selected gates, enabling constrained circuit generation and potential circuit cutting. Using a transformer-decoder GQE on BeH2, the authors show that P-DPO yields lower ground-state energies than DPO and that the hybrid method further improves convergence and final energies, especially with P-DPO. These methods extend GQE’s applicability to larger operator pools and constrained circuit designs, offering practical gains for quantum chemistry and circuit-reduction strategies.

Abstract

We study the generative quantum eigensolver (GQE)~\cite{nakaji2024generative}, which trains a classical generative model to produce quantum circuits with desired properties such as describing molecular ground states. We introduce two methods to improve GQE. First, we identify a limitation of direct preference optimization (DPO) when used as the loss function in GQE, and propose Persistent-DPO (P-DPO) as a solution to this limitation. Second, as a method to improve the online learning during the training phase of GQE, we introduce a hybrid approach that combines online and offline learning. Using a transformer decoder implementation of GQE, we evaluate our methods through ground state search experiments on the $\mathrm{BeH_2^{}}$ molecule and observe that P-DPO achieves lower energies than DPO. The hybrid approach further improves convergence and final energy values, particularly with P-DPO. A method for imposing upper constraints on the occurrences of specific gates is also presented, which serves to enhance the applicability of GQE.

Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver

TL;DR

The paper tackles learning efficiency in generative quantum eigensolvers by introducing Persistent-DPO (P-DPO), which preserves learning pressure across samples to avoid vanishing updates, and a hybrid offline-online training scheme that reuses past low-energy samples to accelerate convergence. It also proposes a loss-masking method to impose upper bounds on the occurrence of selected gates, enabling constrained circuit generation and potential circuit cutting. Using a transformer-decoder GQE on BeH2, the authors show that P-DPO yields lower ground-state energies than DPO and that the hybrid method further improves convergence and final energies, especially with P-DPO. These methods extend GQE’s applicability to larger operator pools and constrained circuit designs, offering practical gains for quantum chemistry and circuit-reduction strategies.

Abstract

We study the generative quantum eigensolver (GQE)~\cite{nakaji2024generative}, which trains a classical generative model to produce quantum circuits with desired properties such as describing molecular ground states. We introduce two methods to improve GQE. First, we identify a limitation of direct preference optimization (DPO) when used as the loss function in GQE, and propose Persistent-DPO (P-DPO) as a solution to this limitation. Second, as a method to improve the online learning during the training phase of GQE, we introduce a hybrid approach that combines online and offline learning. Using a transformer decoder implementation of GQE, we evaluate our methods through ground state search experiments on the molecule and observe that P-DPO achieves lower energies than DPO. The hybrid approach further improves convergence and final energy values, particularly with P-DPO. A method for imposing upper constraints on the occurrences of specific gates is also presented, which serves to enhance the applicability of GQE.

Paper Structure

This paper contains 11 sections, 7 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Procedure to compute the log probability for $\vec{j}=[0, 4, 1, 6, 3, 1]$. The operator pool size is set $L=6$, and it is assumed that only IDs $\{5, 6\}$ among the six operators are subject to the constraint on their occurrences to 1. Refer to the main text for further details.
  • Figure 2: Mean absolute values of the gradient $\nabla_{\theta}^{} {\cal L}_{\mathrm{P-DPO}}$ are plotted as a function of $z$. The results of DPO ($\alpha=0$), P-DPO ($\alpha=0.25$) and P-DPO ($\alpha=0.5$) are shown in the panel (a), (b) and (c), respectively. It is observed that the gradient approaches zero as $z$ increases in DPO, whereas the minimum gradient value departs from zero as $\alpha$ increases.
  • Figure 3: The minimum energy values in unit of Hartree achieved up to each iteration step are plotted as a function of the iteration step. The mean value over the five runs with five different random seeds is shown by a solid line, and the region spanning the maximum and minimum values of those five runs is shaded. The black dashed line shows the ground state energy from the exact calculation, and the black dotted line shows the VQE energy from the PennyLane dataset Utkarsh2023Chemistry. In the panel (a), DPO and P-DPO ($\alpha=0.5$) are compared. In the panel (b), P-DPO ($\alpha=0.5$) and P-DPO ($\alpha=1$) are compared.
  • Figure 4: The dependence on the hyperparameter $\beta$ is shown by comparing the results for three different values, $\{0.5, 0.1, 0.05\}$. The results of DPO, P-DPO ($\alpha=0.5$) and P-DPO ($\alpha=1$) are shown in the panel (a), (b) and (c), respectively. The plotting rules are the same as those in Figure \ref{['fig:alpha_depend']}.
  • Figure 5: The hybrid approaches with two different sets of the hyperparameters, namely $(C, R)=(25, 2)$ (orange) and $(C, R)=(100, 4)$ (green) are compared with the online learning (blue). The results of DPO, P-DPO ($\alpha=0.5$) and P-DPO ($\alpha=1$) are shown in the panel (a), (b) and (c), respectively. The plotting rules are the same as those in Figure \ref{['fig:alpha_depend']}.
  • ...and 3 more figures