Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
Bernd Frauenknecht, Artur Eisele, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe
TL;DR
This work tackles the data inefficiency of model-free RL by introducing MACURA, a model-based actor-critic that adapts model-based rollouts based on local uncertainty. It defines a trustworthy region $\\mathcal{E}$ using a geometric Jensen-Shannon (GJS) divergence-based uncertainty measure $u_{GJS}$ and proves a monotonic-improvement bound when rollouts are confined to $\\mathcal{E}$. The method uses an adaptive threshold $\\kappa$ and a simple rollout horizon mechanism, with environment exploration (notably pink noise) to expand $\\mathcal{E}$ over time. Empirical results on MuJoCo show MACURA delivers superior data efficiency and competitive or superior asymptotic performance compared to MBPO, M2AC, and SAC, while requiring less hyperparameter tuning.
Abstract
Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark.
