Table of Contents
Fetching ...

Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

Xiaobin Song, Siyuan Bai, Da-Wei Wang, Hanxiao Tao, Xizhe Wang, Rebing Wu, Benben Jiang

TL;DR

This work addresses fast charging of inhomogeneous Dicke quantum batteries under partial observability by casting charging as a discrete-time reinforcement-learning problem. Using Soft Actor–Critic, the agent learns piecewise-constant charger coupling policies that maximize terminal ergotropy $\mathcal{E}(\tau)$ while using observables from four regimes, from full state to experimentally accessible correlations. The key finding is that second-order correlations in observables recover most of the performance gap caused by partial information, achieving 94–98% of the full-state baseline, with learned schedules that are nonmyopic and terminal-focused. The results offer a practical route to efficient fast-charging protocols under realistic readout constraints and motivate future POMDP-driven or robust-control extensions for experimental quantum batteries.

Abstract

Charging optimization is a key challenge to the implementation of quantum batteries, particularly under inhomogeneity and partial observability. This paper employs reinforcement learning to optimize piecewise-constant charging policies for an inhomogeneous Dicke battery. We systematically compare policies across four observability regimes, from full-state access to experimentally accessible observables (energies of individual two-level systems (TLSs), first-order averages, and second-order correlations). Simulation results demonstrate that full observability yields near-optimal ergotropy with low variability, while under partial observability, access to only single-TLS energies or energies plus first-order averages lags behind the fully observed baseline. However, augmenting partial observations with second-order correlations recovers most of the gap, reaching 94%-98% of the full-state baseline. The learned schedules are nonmyopic, trading temporary plateaus or declines for superior terminal outcomes. These findings highlight a practical route to effective fast-charging protocols under realistic information constraints.

Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

TL;DR

This work addresses fast charging of inhomogeneous Dicke quantum batteries under partial observability by casting charging as a discrete-time reinforcement-learning problem. Using Soft Actor–Critic, the agent learns piecewise-constant charger coupling policies that maximize terminal ergotropy while using observables from four regimes, from full state to experimentally accessible correlations. The key finding is that second-order correlations in observables recover most of the performance gap caused by partial information, achieving 94–98% of the full-state baseline, with learned schedules that are nonmyopic and terminal-focused. The results offer a practical route to efficient fast-charging protocols under realistic readout constraints and motivate future POMDP-driven or robust-control extensions for experimental quantum batteries.

Abstract

Charging optimization is a key challenge to the implementation of quantum batteries, particularly under inhomogeneity and partial observability. This paper employs reinforcement learning to optimize piecewise-constant charging policies for an inhomogeneous Dicke battery. We systematically compare policies across four observability regimes, from full-state access to experimentally accessible observables (energies of individual two-level systems (TLSs), first-order averages, and second-order correlations). Simulation results demonstrate that full observability yields near-optimal ergotropy with low variability, while under partial observability, access to only single-TLS energies or energies plus first-order averages lags behind the fully observed baseline. However, augmenting partial observations with second-order correlations recovers most of the gap, reaching 94%-98% of the full-state baseline. The learned schedules are nonmyopic, trading temporary plateaus or declines for superior terminal outcomes. These findings highlight a practical route to effective fast-charging protocols under realistic information constraints.

Paper Structure

This paper contains 14 sections, 16 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Average ergotropy versus training episodes under four settings of available information, averaged over five random seeds, for three inhomogeneous quantum batteries: (a) Env1, (b) Env2, (c) Env3. The shaded regions represent the standard deviation over five seeds.
  • Figure 2: Best charging protocols among five seeds for four our observable regimes [(a):$E_j(t)$, (b):$E_j(t) + \langle\hat{\sigma}_\alpha^{(i)}\rangle_t$, (c):$E_j(t) + \langle\hat{\sigma}_\alpha^{(i)}\rangle_t + \langle \hat{\sigma}_\alpha ^{(i)}\hat{\sigma}_\beta^{(j)}\rangle_t$, (d):$|\psi(t)\rangle$] and their ergotropy trajectories (e) in Env1.