MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning
Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo
TL;DR
MBDP tackles the robustness-sample efficiency dilemma in model-based reinforcement learning by introducing two complementary dropout mechanisms: rollout_dropout to bias learning toward lower-reward events for robustness, and model_dropout to prune an ensemble by predictive bias for efficiency. The framework establishes theoretical guarantees linking rollout dropout to CVaR robustness and provides explicit bounds on performance degradation due to dropout, while enabling a tunable trade-off via the parameters $\\alpha$ and $\\beta$. Empirically, MBDP shows superior sample efficiency and competitive robustness on MuJoCo tasks compared with leading baselines, with ablations confirming the distinct roles of each dropout component. The results demonstrate a flexible, theoretically grounded approach for balancing robustness and efficiency in practical, continuous-control settings.
Abstract
Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.
