Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

Xiaobin Song; Siyuan Bai; Da-Wei Wang; Hanxiao Tao; Xizhe Wang; Rebing Wu; Benben Jiang

Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

Xiaobin Song, Siyuan Bai, Da-Wei Wang, Hanxiao Tao, Xizhe Wang, Rebing Wu, Benben Jiang

TL;DR

This work addresses fast charging of inhomogeneous Dicke quantum batteries under partial observability by casting charging as a discrete-time reinforcement-learning problem. Using Soft Actor–Critic, the agent learns piecewise-constant charger coupling policies that maximize terminal ergotropy $\mathcal{E}(\tau)$ while using observables from four regimes, from full state to experimentally accessible correlations. The key finding is that second-order correlations in observables recover most of the performance gap caused by partial information, achieving 94–98% of the full-state baseline, with learned schedules that are nonmyopic and terminal-focused. The results offer a practical route to efficient fast-charging protocols under realistic readout constraints and motivate future POMDP-driven or robust-control extensions for experimental quantum batteries.

Abstract

Charging optimization is a key challenge to the implementation of quantum batteries, particularly under inhomogeneity and partial observability. This paper employs reinforcement learning to optimize piecewise-constant charging policies for an inhomogeneous Dicke battery. We systematically compare policies across four observability regimes, from full-state access to experimentally accessible observables (energies of individual two-level systems (TLSs), first-order averages, and second-order correlations). Simulation results demonstrate that full observability yields near-optimal ergotropy with low variability, while under partial observability, access to only single-TLS energies or energies plus first-order averages lags behind the fully observed baseline. However, augmenting partial observations with second-order correlations recovers most of the gap, reaching 94%-98% of the full-state baseline. The learned schedules are nonmyopic, trading temporary plateaus or declines for superior terminal outcomes. These findings highlight a practical route to effective fast-charging protocols under realistic information constraints.

Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

TL;DR

Abstract

Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)