Table of Contents
Fetching ...

Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes

Peter N. Loxley

TL;DR

This work addresses optimal control with sequences of natural images by treating images as potential sufficient statistics and showing that overcomplete sparse codes enable efficient, scalable reinforcement learning. It introduces a scalable image-patch benchmark based on a target-tracking dynamics and demonstrates that an overcomplete sparse-code representation dramatically expands the tractable state space while maintaining tractable training via Fitted Value Iteration. The key findings show that such representations accelerate learning, increase storage capacity, and allow exact or near-exact policy solutions on tasks orders of magnitude larger than with complete codes, without requiring deep networks. The practical impact lies in providing a principled route to efficient vision-based control and a testbed for comparing image representations in RL, with clear theoretical and computational advantages.

Abstract

Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.

Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes

TL;DR

This work addresses optimal control with sequences of natural images by treating images as potential sufficient statistics and showing that overcomplete sparse codes enable efficient, scalable reinforcement learning. It introduces a scalable image-patch benchmark based on a target-tracking dynamics and demonstrates that an overcomplete sparse-code representation dramatically expands the tractable state space while maintaining tractable training via Fitted Value Iteration. The key findings show that such representations accelerate learning, increase storage capacity, and allow exact or near-exact policy solutions on tasks orders of magnitude larger than with complete codes, without requiring deep networks. The practical impact lies in providing a principled route to efficient vision-based control and a testbed for comparing image representations in RL, with clear theoretical and computational advantages.

Abstract

Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.

Paper Structure

This paper contains 13 sections, 1 theorem, 33 equations, 10 figures, 4 tables.

Key Result

Proposition 1

A state of the benchmark can be described by $i$ or $\phi(i)$, and either state is a sufficient statistic.

Figures (10)

  • Figure 1: Neural network for reinforcement learning with natural images. The first two layers form a sparse autoencoder that generates an overcomplete sparse code $\phi$ by reconstructing the image input $I$ using an overcomplete basis of Gabor functions adapted to natural image statistics. The output $r\boldsymbol{\cdot}\phi$ then approximates the cost-to-go $\beta$ using weights $r$. The network storage capacity has increased from $d$ to (close to) $m$ by using an overcomplete sparse code.
  • Figure 2: A target tracking sequence (from left to right) showing optimal and suboptimal trackers. A tracker can move either "up" or "right", while the target can move "up", "right", or "diagonally". The suboptimal tracker (red square) follows the target (T) as closely as possible at each step (a greedy approach), causing it to fall behind when the target moves diagonally. The optimal tracker (blue square) follows the target by anticipating a diagonal move following a time period where the target is stationary.
  • Figure 3: A Markov chain for generating the target dynamics shown in Fig 1. In Fig 1, the states $\mathrm{s,d,}$ and $\mathrm{r}$ correspond to "same position", "diagonal move", and "up" or "right", respectively. The non-zero transition probabilities are: $p(\Delta t_k=\mathrm{d}|\Delta t_{k-1}=\mathrm{s})=1$, $p(\Delta t_k=\mathrm{r}|\Delta t_{k-1}=\mathrm{d})=1$, $p(\Delta t_k=\mathrm{s}|\Delta t_{k-1}=\mathrm{r})=1-p$, and $p(\Delta t_k=\mathrm{r}|\Delta t_{k-1}=\mathrm{r})=p$.
  • Figure 4: Expected total cost of the optimal policy from Table \ref{['t2']} (blue circles), and the greedy policy from Table \ref{['t3']} (red pluses), as the horizon goes from $N=1$ to 30 time periods for $p=0$ and $p=1$.
  • Figure 5: Expected total cost of optimal and greedy policies as the horizon goes from $N=1$ to $N=30$ time periods for $p=0$ with different initial states. The bottom pair of curves start at state $((0,1),(0,0))$; the middle pair start at state $((3,0),(0,0))$; and the top pair start at state $((4,0),(0,0))$.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof