Value Iteration Networks
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
TL;DR
<3-5 sentence high-level summary> This paper addresses the gap between reactive neural policies and explicit planning in sequential decision tasks. It introduces the Value Iteration Network (VIN), a differentiable module that implements an approximate value-iteration planner inside a CNN and can be trained end-to-end. The authors demonstrate VIN-based policies across grid-world navigation, Mars terrain navigation, continuous control, and WebNav, showing improved generalization to unseen task instances. They discuss extensions such as hierarchical VI networks (HVIN) to speed planning and combine multiple planning computations. The work provides a general planning primitive that can be integrated with perception and control in RL/IL systems.
Abstract
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.
