Table of Contents
Fetching ...

Improved Offline Reinforcement Learning via Quantum Metric Encoding

Outongyi Lv, Yewei Yuan, Nana Liu

TL;DR

This work tackles offline reinforcement learning under severe data constraints by introducing Quantum Metric Encoding (QME), a quantum-inspired encoder–decoder built from parameterized unitary circuits that learns a compact state metric and decoded rewards. By replacing raw states with QME embeddings and using decoded rewards to train SAC and IQL, the approach achieves substantial improvements in maximum evaluation returns across three 100-sample datasets, with average gains of $116.2\%$ for SAC and $117.6\%$ for IQL. The authors link these gains to a geometry change in the embedding space, notably lower $Δ$-hyperbolicity, which correlates with better RL training under limited data. Importantly, QME operates in a fully classical simulation framework while remaining compatible with quantum hardware and quantum-native data, offering a practical quantum-inspired tool for data-scarce offline RL scenarios.

Abstract

Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative approach to dealing with limited samples by introducing the Quantum Metric Encoder (QME). In this methodology, instead of applying the RL framework directly on the original states and rewards, we embed the states into a more compact and meaningful representation, where the structure of the encoding is inspired by quantum circuits. For classical data, QME is a classically simulable, trainable unitary embedding and thus serves as a quantum-inspired module, on a classical device. For quantum data in the form of quantum states, QME can be implemented directly on quantum hardware, allowing for training without measurement or re-encoding. We evaluated QME on three datasets, each limited to 100 samples. We use Soft-Actor-Critic (SAC) and Implicit-Q-Learning (IQL), two well-known RL algorithms, to demonstrate the effectiveness of our approach. From the experimental results, we find that training offline RL agents on QME-embedded states with decoded rewards yields significantly better performance than training on the original states and rewards. On average across the three datasets, for maximum reward performance, we achieve a 116.2% improvement for SAC and 117.6% for IQL. We further investigate the $Δ$-hyperbolicity of our framework, a geometric property of the state space known to be important for the RL training efficacy. The QME-embedded states exhibit low $Δ$-hyperbolicity, suggesting that the improvement after embedding arises from the modified geometry of the state space induced by QME. Thus, the low $Δ$-hyperbolicity and the corresponding effectiveness of QME could provide valuable information for developing efficient offline RL methods under limited-sample conditions.

Improved Offline Reinforcement Learning via Quantum Metric Encoding

TL;DR

This work tackles offline reinforcement learning under severe data constraints by introducing Quantum Metric Encoding (QME), a quantum-inspired encoder–decoder built from parameterized unitary circuits that learns a compact state metric and decoded rewards. By replacing raw states with QME embeddings and using decoded rewards to train SAC and IQL, the approach achieves substantial improvements in maximum evaluation returns across three 100-sample datasets, with average gains of for SAC and for IQL. The authors link these gains to a geometry change in the embedding space, notably lower -hyperbolicity, which correlates with better RL training under limited data. Importantly, QME operates in a fully classical simulation framework while remaining compatible with quantum hardware and quantum-native data, offering a practical quantum-inspired tool for data-scarce offline RL scenarios.

Abstract

Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative approach to dealing with limited samples by introducing the Quantum Metric Encoder (QME). In this methodology, instead of applying the RL framework directly on the original states and rewards, we embed the states into a more compact and meaningful representation, where the structure of the encoding is inspired by quantum circuits. For classical data, QME is a classically simulable, trainable unitary embedding and thus serves as a quantum-inspired module, on a classical device. For quantum data in the form of quantum states, QME can be implemented directly on quantum hardware, allowing for training without measurement or re-encoding. We evaluated QME on three datasets, each limited to 100 samples. We use Soft-Actor-Critic (SAC) and Implicit-Q-Learning (IQL), two well-known RL algorithms, to demonstrate the effectiveness of our approach. From the experimental results, we find that training offline RL agents on QME-embedded states with decoded rewards yields significantly better performance than training on the original states and rewards. On average across the three datasets, for maximum reward performance, we achieve a 116.2% improvement for SAC and 117.6% for IQL. We further investigate the -hyperbolicity of our framework, a geometric property of the state space known to be important for the RL training efficacy. The QME-embedded states exhibit low -hyperbolicity, suggesting that the improvement after embedding arises from the modified geometry of the state space induced by QME. Thus, the low -hyperbolicity and the corresponding effectiveness of QME could provide valuable information for developing efficient offline RL methods under limited-sample conditions.

Paper Structure

This paper contains 18 sections, 23 equations, 3 figures.

Figures (3)

  • Figure 1: The structure of our model, including the QME and Offline RL Training parts. The structure for QME's details is thoroughly discussed in Section \ref{['sec:method']}.
  • Figure 2: The structure and operation of the Quantum Metric Encoder (QME). Starting from quantum-encoded data, the encoder concentrates information into $n_{\text{latent}}$ qubits and routes redundancy to $n_{\text{trash}}$; the disposer $U_t(\theta_t)$ resets trash to $\ket{0}^{\otimes n_{\text{trash}}}$. A reward rotation writes the normalized reward $g_i$ onto a target qubit; after $U_d(\theta_d)$, an inverse rotation returns it to $\ket{0}$ when decoded correctly. The loss (\ref{['eq: li']}) maximizes the $\ket{0}$ probability for target and trash registers, with the trade-off controlled by $\delta$.
  • Figure 3: Algorithm 1: QME and Offline RL Training Structure. Part 1 illustrates the process of training the QME circuit, as shown in Section \ref{['sec:Quantum supervised training']}. Part 2 describes the subsequent RL training process using QME-embedded states and decoded reward, detailed in Section \ref{['sec:Application in downstream RL tasks']}.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2