Table of Contents
Fetching ...

Quantum Algorithm for Apprenticeship Learning

Andris Ambainis, Debbie Lim

Abstract

Apprenticeship learning is a method commonly used to train artificial intelligence systems to perform tasks that are challenging to specify directly using traditional methods. Based on the work of Abbeel and Ng (ICML'04), we present a quantum algorithm for apprenticeship learning via inverse reinforcement learning. As an intermediate step, we give a classical approximate apprenticeship learning algorithm to demonstrate the speedup obtained by our quantum algorithm. We prove convergence guarantees on our classical approximate apprenticeship learning algorithm, which also extends to our quantum apprenticeship learning algorithm. We also show that, as compared to its classical counterpart, our quantum algorithm achieves an improvement in the per-iteration time complexity by a quadratic factor in the dimension of the feature vectors $k$ and the size of the action space $A$.

Quantum Algorithm for Apprenticeship Learning

Abstract

Apprenticeship learning is a method commonly used to train artificial intelligence systems to perform tasks that are challenging to specify directly using traditional methods. Based on the work of Abbeel and Ng (ICML'04), we present a quantum algorithm for apprenticeship learning via inverse reinforcement learning. As an intermediate step, we give a classical approximate apprenticeship learning algorithm to demonstrate the speedup obtained by our quantum algorithm. We prove convergence guarantees on our classical approximate apprenticeship learning algorithm, which also extends to our quantum apprenticeship learning algorithm. We also show that, as compared to its classical counterpart, our quantum algorithm achieves an improvement in the per-iteration time complexity by a quadratic factor in the dimension of the feature vectors and the size of the action space .

Paper Structure

This paper contains 15 sections, 9 theorems, 54 equations, 1 figure, 1 table.

Key Result

Lemma 2

Let there be given an $MDP\backslash R$, feature vectors $\phi:\mathcal{S} \rightarrow [0, 1]^k$ and a set of policies $\tilde{\Pi}$, $\bar{\mu}^{(i)}\in \tilde{M}$. Let $\epsilon _{\operatorname{RL}} \in (0, 1)$ be such that $\left\Vert \hat{\mu}_E\right\Vert_2^2\geq 2\epsilon_{\operatorname{RL}}$ and the point $\tilde{\mu}^{(i+1)}$ is a convex combination of $\bar{\mu}^{(i)}$ and $\mu^{(i+1)}$.

Figures (1)

  • Figure 1: An illustration of the apprenticeship learning algorithm.

Theorems & Definitions (18)

  • Lemma 2: Per-iteration improvement of Algorithm \ref{['AL']}
  • proof
  • Theorem 3: Convergence guarantee of Algorithm \ref{['AL']}
  • proof
  • Lemma 5
  • proof
  • Lemma 8
  • proof
  • Lemma 9
  • proof
  • ...and 8 more