Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
Igor Kuznetsov
TL;DR
This work tackles the limitations of random exploration in deep deterministic off-policy RL by introducing guided exploration through an Exploratory Module that leverages an ensemble of Monte Carlo critics to quantify uncertainty and generate directed action corrections. A novel MOCCO algorithm combines this exploration mechanism with a Monte Carlo-augmented critic loss, using the MC mean to temper Q-value overestimation while an on-policy exploratory correction guides action selection. Empirical results on the DMControl suite show that guided exploration improves over traditional noise-based methods, and MOCCO consistently outperforms major off-policy baselines (DDPG, TD3, SAC, and TD3-RND) with robust performance across tasks and modest hyperparameter sensitivity. The approach offers a practical, learnable mechanism for dynamic exploration that can enhance sample efficiency and performance in continuous control tasks, with potential extensions to model-based or memory-augmented exploration frameworks.
Abstract
The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. Current approaches commonly utilize random noise as an exploration method, which has several drawbacks, including the need for manual adjustment for a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction. The proposed method enhances the traditional exploration scheme by dynamically adjusting exploration. Subsequently, we present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification. The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.
