Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver
Junya Nakamura, Shinichiro Sanji
TL;DR
The paper tackles learning efficiency in generative quantum eigensolvers by introducing Persistent-DPO (P-DPO), which preserves learning pressure across samples to avoid vanishing updates, and a hybrid offline-online training scheme that reuses past low-energy samples to accelerate convergence. It also proposes a loss-masking method to impose upper bounds on the occurrence of selected gates, enabling constrained circuit generation and potential circuit cutting. Using a transformer-decoder GQE on BeH2, the authors show that P-DPO yields lower ground-state energies than DPO and that the hybrid method further improves convergence and final energies, especially with P-DPO. These methods extend GQE’s applicability to larger operator pools and constrained circuit designs, offering practical gains for quantum chemistry and circuit-reduction strategies.
Abstract
We study the generative quantum eigensolver (GQE)~\cite{nakaji2024generative}, which trains a classical generative model to produce quantum circuits with desired properties such as describing molecular ground states. We introduce two methods to improve GQE. First, we identify a limitation of direct preference optimization (DPO) when used as the loss function in GQE, and propose Persistent-DPO (P-DPO) as a solution to this limitation. Second, as a method to improve the online learning during the training phase of GQE, we introduce a hybrid approach that combines online and offline learning. Using a transformer decoder implementation of GQE, we evaluate our methods through ground state search experiments on the $\mathrm{BeH_2^{}}$ molecule and observe that P-DPO achieves lower energies than DPO. The hybrid approach further improves convergence and final energy values, particularly with P-DPO. A method for imposing upper constraints on the occurrences of specific gates is also presented, which serves to enhance the applicability of GQE.
