Table of Contents
Fetching ...

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

Brian Gaudet, Richard Linares, Roberto Furfaro

TL;DR

The paper presents a predictive-coding architecture capable of accurate extended-horizon open-loop predictions for high-dimensional trajectories, addressing sample efficiency in model-based RL. It introduces two key innovations: bootstrapping the hidden state from the initial observation and performing an open-loop forward pass during training, enabling immediate long-horizon predictions. Empirical results show superior multi-step prediction accuracy over standard predictive coding and demonstrate a 2x improvement in PPO sample efficiency via a Dyna-style RL framework. The work advances model-based planning for complex, high-dimensional domains, with practical implications for autonomous Mars landing and similar high-stakes control tasks.

Abstract

We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

TL;DR

The paper presents a predictive-coding architecture capable of accurate extended-horizon open-loop predictions for high-dimensional trajectories, addressing sample efficiency in model-based RL. It introduces two key innovations: bootstrapping the hidden state from the initial observation and performing an open-loop forward pass during training, enabling immediate long-horizon predictions. Empirical results show superior multi-step prediction accuracy over standard predictive coding and demonstrate a 2x improvement in PPO sample efficiency via a Dyna-style RL framework. The work advances model-based planning for complex, high-dimensional domains, with practical implications for autonomous Mars landing and similar high-stakes control tasks.

Abstract

We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.

Paper Structure

This paper contains 13 sections, 2 equations, 2 figures, 7 tables, 2 algorithms.

Figures (2)

  • Figure 1: 6-DOF Open Loop PCM Prediction over an entire Episode
  • Figure 2: Dyna versus Standard PPO Performance