Increasing Information for Model Predictive Control with Semi-Markov Decision Processes
Rémy Hosseinkhan Boucher, Onofrio Semeraro, Lionel Mathelin
TL;DR
The paper tackles data efficiency in Learning-Based Model Predictive Control by enabling temporally extended actions through Semi-Markov Decision Processes ($SMDP$). It formulates a trajectory-focused Expected Information Gain criterion ($EIG$) under the $SM$-$TIP$ extension and demonstrates that temporally abstracted sampling increases information gathered within a fixed budget, improving sample efficiency on the Inverted Pendulum and Lorenz attractor. While the approach boosts early information gain, it also introduces non-causal estimation challenges and bootstrapping trade-offs for very long inter-decision times. Overall, temporal abstraction via $SMDP$s offers a promising avenue to enhance data-driven MPC for systems with multiple time scales and rich dynamics.
Abstract
Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.
