Table of Contents
Fetching ...

Highway Value Iteration Networks

Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

TL;DR

This work tackles the difficulty of long-horizon planning in Value Iteration Networks by introducing Highway Value Iteration Networks (highway VINs). By embedding highway value iteration into VINs and adding an aggregate gate, a value exploration (VE) module, and a filter gate, the authors enable effective training of hundreds of layers and improve planning across hundreds of steps. Empirical results on 2D maze navigation and 3D ViZDoom show highway VINs outperform traditional VINs and several deep baselines, especially as planning horizons lengthen; ablations confirm the necessity of VE and gating components. The approach also clarifies connections between highway RL and highway networks, offering a scalable, end-to-end framework for deep planning in complex environments.

Abstract

Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.

Highway Value Iteration Networks

TL;DR

This work tackles the difficulty of long-horizon planning in Value Iteration Networks by introducing Highway Value Iteration Networks (highway VINs). By embedding highway value iteration into VINs and adding an aggregate gate, a value exploration (VE) module, and a filter gate, the authors enable effective training of hundreds of layers and improve planning across hundreds of steps. Empirical results on 2D maze navigation and 3D ViZDoom show highway VINs outperform traditional VINs and several deep baselines, especially as planning horizons lengthen; ablations confirm the necessity of VE and gating components. The approach also clarifies connections between highway RL and highway networks, offering a scalable, end-to-end framework for deep planning in complex environments.

Abstract

Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
Paper Structure (29 sections, 12 equations, 11 figures, 5 tables)

This paper contains 29 sections, 12 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Success rates of reaching the goal in a $25\times 25$ maze problem. The success rate of a 30-layer VIN considerably decreases as the shortest path length increases, and training a 300-layer VIN is difficult and exhibits poor performance.
  • Figure 2: \ref{['fig_VIN_HighwayVIN']}: Architecture of VIN and highway VIN. \ref{['fig_planning_VIN']}: Architecture of the planning module of VIN, which includes $N$ layers of value iteration modules. The architecture of the value iteration module is detailed in \ref{['fig_VI_module']}.
  • Figure 3: Planning module of highway VIN. Here, we demonstrate the planning module of highway VIN using a highway block of depth $N_b=4$ and incorporating $N_{p} =2$ embedded policies.
  • Figure 4: Architecture of the value iteration module and VE module, respectively. The operation $\max_{\overline{\mathcal{A}} \times 1 \times 1 }$ denotes a max operation over the action axis, as shown in \ref{['eq_V_from_Q']}. The operation $\mathrm{linear}_{\overline{\mathcal{A}} \times 1 \times 1 }$ represents a linear combination of the input Q matrix $\overline{Q}_{ n_{p} }$ and the policy matrix $\overline{\pi}_{ n_{p} }$ over the action axis, as shown in \ref{['eq_V_from_Q__Highway']}.
  • Figure 5: Success rates of the algorithms are presented as a function of varying shortest path length. For each algorithm, the optimal result from a range of depths is selected. For a comprehensive view of the results across all depths, please see \ref{['fig__success_rate__algorithm__all_depths']} in the Appendix.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2