Table of Contents
Fetching ...

Increasing Information for Model Predictive Control with Semi-Markov Decision Processes

Rémy Hosseinkhan Boucher, Onofrio Semeraro, Lionel Mathelin

TL;DR

The paper tackles data efficiency in Learning-Based Model Predictive Control by enabling temporally extended actions through Semi-Markov Decision Processes ($SMDP$). It formulates a trajectory-focused Expected Information Gain criterion ($EIG$) under the $SM$-$TIP$ extension and demonstrates that temporally abstracted sampling increases information gathered within a fixed budget, improving sample efficiency on the Inverted Pendulum and Lorenz attractor. While the approach boosts early information gain, it also introduces non-causal estimation challenges and bootstrapping trade-offs for very long inter-decision times. Overall, temporal abstraction via $SMDP$s offers a promising avenue to enhance data-driven MPC for systems with multiple time scales and rich dynamics.

Abstract

Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.

Increasing Information for Model Predictive Control with Semi-Markov Decision Processes

TL;DR

The paper tackles data efficiency in Learning-Based Model Predictive Control by enabling temporally extended actions through Semi-Markov Decision Processes (). It formulates a trajectory-focused Expected Information Gain criterion () under the - extension and demonstrates that temporally abstracted sampling increases information gathered within a fixed budget, improving sample efficiency on the Inverted Pendulum and Lorenz attractor. While the approach boosts early information gain, it also introduces non-causal estimation challenges and bootstrapping trade-offs for very long inter-decision times. Overall, temporal abstraction via s offers a promising avenue to enhance data-driven MPC for systems with multiple time scales and rich dynamics.

Abstract

Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.

Paper Structure

This paper contains 14 sections, 10 equations, 4 figures.

Figures (4)

  • Figure 1: $(Cov(X_0, X_k))_{k \in \mathbb{N}}$ for the controlled Lorenz system $x_3$ component under multiple control intensities.
  • Figure 2: Evolution of the Expected Information Gain $\text{EIG}^\text{SM-TIP}$ over the number of sampling iterations.
  • Figure 3: Inter-decision time $\tau$ chosen by the SMDP during training.
  • Figure 4: Evolution of the objective function $J^{\widehat{\pi}^{\text{MPC}}}$ to evaluate the system during training.