Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Reinforcement Learning
Younggyo Seo, Pieter Abbeel
TL;DR
This work asks whether predicting and optimizing over action sequences can enhance reinforcement learning for robotics. It introduces Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a critic-only algorithm that outputs Q-values for whole action sequences, thereby improving data efficiency on sparse-reward tasks. Through extensive experiments on BiGym, RLBench, and HumanoidBench, CQN-AS demonstrates superior performance over strong baselines and provides ablations showing the importance of RL objectives, sequence length, and temporal ensemble. The findings suggest action-sequence-based value learning is a practical route to more data-efficient RL in complex robotic domains, with potential extensions to offline, model-based, or vision-enhanced settings.
Abstract
Predicting a sequence of actions has been crucial in the success of recent behavior cloning algorithms in robotics. Can similar ideas improve reinforcement learning (RL)? We answer affirmatively by observing that incorporating action sequences when predicting ground-truth return-to-go leads to lower validation loss. Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i.e., explicitly training the value function to learn the consequence of executing action sequences. Our experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks from BiGym and RLBench.
