Quantum contextual bandits and recommender systems for quantum data

Shrigyan Brahmachari; Josep Lumbreras; Marco Tomamichel

Quantum contextual bandits and recommender systems for quantum data

Shrigyan Brahmachari, Josep Lumbreras, Marco Tomamichel

TL;DR

This work frames quantum data recommendation as a quantum contextual bandit problem (QCB) where contexts are Hamiltonians and actions are unknown quantum states. It develops a linear contextual-bandit approach (LinUCB) adapted to the quantum setting by expressing states and observables in a Pauli-like basis, enabling a dimension-reduced, online recommendation of low-energy states. A lower bound shows that no strategy can beat a $Ω(\sqrt{kT} \cdot \min(d,\sqrt{c}))$ scaling, while the proposed Gram-Schmidt–augmented LinUCB achieves near-optimal performance with manageable space complexity $O(k d_{eff}^2)$. The authors demonstrate the method on Ising and generalized cluster Hamiltonians, revealing that recommendations align with Hamiltonian phases and effectively classify phases online. This framework offers a principled, scalable way to select quantum preparations for energy-minimization tasks in NISQ-era workflows and provides a foundation for phase-aware quantum data recommender systems.

Abstract

We study a recommender system for quantum data using the linear contextual bandit framework. In each round, a learner receives an observable (the context) and has to recommend from a finite set of unknown quantum states (the actions) which one to measure. The learner has the goal of maximizing the reward in each round, that is the outcome of the measurement on the unknown state. Using this model we formulate the low energy quantum state recommendation problem where the context is a Hamiltonian and the goal is to recommend the state with the lowest energy. For this task, we study two families of contexts: the Ising model and a generalized cluster model. We observe that if we interpret the actions as different phases of the models then the recommendation is done by classifying the correct phase of the given Hamiltonian and the strategy can be interpreted as an online quantum phase classifier.

Quantum contextual bandits and recommender systems for quantum data

TL;DR

scaling, while the proposed Gram-Schmidt–augmented LinUCB achieves near-optimal performance with manageable space complexity

. The authors demonstrate the method on Ising and generalized cluster Hamiltonians, revealing that recommendations align with Hamiltonian phases and effectively classify phases online. This framework offers a principled, scalable way to select quantum preparations for energy-minimization tasks in NISQ-era workflows and provides a foundation for phase-aware quantum data recommender systems.

Abstract

Paper Structure (12 sections, 1 theorem, 25 equations, 5 figures, 2 algorithms)

This paper contains 12 sections, 1 theorem, 25 equations, 5 figures, 2 algorithms.

Introduction
The model
Lower bound
Algorithm
Linear disjoint single context bandits and QCB
Linear Upper Confidence Bound algorithm
Low energy quantum state recommender system
Gram-Schmidt method
Phase classifier
Numerical simulations
Generalised Cluster Model
Outlook

Key Result

Theorem 2

Consider a quantum contextual bandit with underlying dimension $d = 2^n$ and $n\in\mathbb{N}$, context size $c\geq 1$ and $k\geq 2$ actions. Then, for any strategy $\pi$, there exists a context set $\mathcal{C}$, $|\mathcal{C}|=c$, a probability distribution over the context set $\mathcal{C}$$P_\mat for $T\geq k\min \lbrace c,d^2 \rbrace$.

Figures (5)

Figure 1: Sketch of a recommender system for quantum data. The learner receives sequentially quantum contexts and feed them to the classical processing system. The context is also fed to the measurement system. The classical processing system uses the information about the context to pick one of the quantum processes (no information regarding these processes are known besides from measurements). The chosen quantum process is applied to the measurement system, and the measurement outcome is fed to the classical processing and is added to the cumulative reward.
Figure 2: Plots for Regret and Classifier regret for QCB bandit $(\gamma,\mathcal{C})$, where the Hamiltonians in $\mathcal{C}$ are a specific form of generalised cluster models acting on 10 and 100 qubits respectively. The performance is not very different since $d_\text{eff}=3$\ref{['eq:d_eff']} for both cases. The action set is chosen to be approx. ground states of some generalised cluster Hamiltonians
Figure 3: These plots illustrate how the recommender system identifies the phases of the generalised cluster Hamiltonian. The x and y-axis represent the coupling coefficients of the generalised cluster Hamiltonian received as context. Like the Ising Model simulations, we associate a color to each action. For any context $H_\text{cluster}(j_1,j_2)$ corresponding to any of the T rounds, one of these actions is picked by the algorithm. We plot the corresponding colored dot (blue for ground state of $H_\text{cluster}(-\infty,0)$, orange for $H_\text{cluster}(0,\infty)$, red for $H_\text{cluster}(\infty,0)$, green for $H_\text{cluster}(0,-\infty)$ and purple for $H_\text{cluster}(0,0)$) at the appropriate coordinates, for rounds that follow after the bandit has "learned" the actions, i.e, the growth in regret has slowed down.
Figure 4: Plots for Regret and Classifier regret for QCB bandit $(\gamma,\mathcal{C})$, where the Hamiltonians in $\mathcal{C}$ are Ising Hamiltonians acting on 10 and 100 qubits respectively. The performance is not very different since $d_\text{eff}=2$\ref{['eq:d_eff']} for both cases. The action set is chosen to be approx. ground states of some Ising Hamiltonians
Figure 5: These plots illustrate how the recommender system identifies the phases of the Ising Hamiltonian. The x-axis represents the external field coefficient of the Ising Hamiltonian received as context. The blue, green, or yellow mark indicates that the algorithm plays the $1^\text{st}$,$2^\text{nd}$ or $3^\text{rd}$ action. We plot the corresponding colored dot at the appropriate coordinates, for rounds that follow after the bandit has "learned" the actions, i.e, the growth in regret has slowed down.

Theorems & Definitions (3)

Definition 1: Quantum contextual bandit
Theorem 2
proof

Quantum contextual bandits and recommender systems for quantum data

TL;DR

Abstract

Quantum contextual bandits and recommender systems for quantum data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)