Bisimulation metric for Model Predictive Control
Yutaka Shimizu, Masayoshi Tomizuka
TL;DR
BS-MPC addresses stability, noise robustness, and computational efficiency gaps in model-based RL by training the encoder with a $\pi^*$-bisimulation loss and integrating this with a model-predictive control framework. It maintains the TD-MPC architecture but adds explicit encoder supervision and a parallelizable computation flow, enabling faster training and stronger guarantees on latent-space fidelity. Theoretical analysis bounds the cumulative reward difference between the original state space and the learned latent space, while empirical results on the DM Control Suite show improved performance and robustness, including under input distractions. The approach offers a practical, scalable pathway to robust model-based planning in high-dimensional, noisy environments.
Abstract
Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite, demonstrating superior performance and robustness compared to state-of-the-art baseline methods.
