MPC-Inspired Neural Network Policies for Sequential Decision Making
Marcus Pereira, David D. Fan, Gabriel Nakajima An, Evangelos Theodorou
TL;DR
This work tackles sequential decision making in continuous state-action spaces by integrating planning into neural policies via MPC-inspired structures. It proposes an extension to DAgger, enabling efficient, end-to-end training of PI-Nets, which implement differentiable path-integral control. The authors compare MPC-type policies (PI-Net and MPC-DAgger variants) with reactive baselines across quadcopter and cart-pole tasks, showing improved robustness to disturbances and generalization to model errors. The results indicate that planning-based recurrent policies outperform purely reactive policies, suggesting a practical route to scalable planning architectures in continuous domains.
Abstract
In this paper we investigate the use of MPC-inspired neural network policies for sequential decision making. We introduce an extension to the DAgger algorithm for training such policies and show how they have improved training performance and generalization capabilities. We take advantage of this extension to show scalable and efficient training of complex planning policy architectures in continuous state and action spaces. We provide an extensive comparison of neural network policies by considering feed forward policies, recurrent policies, and recurrent policies with planning structure inspired by the Path Integral control framework. Our results suggest that MPC-type recurrent policies have better robustness to disturbances and modeling error.
