Table of Contents
Fetching ...

Continuous Control with Coarse-to-fine Reinforcement Learning

Younggyo Seo, Jafar Uruç, Stephen James

TL;DR

This work introduces Coarse-to-fine Reinforcement Learning (CRL), a framework that enables stable, sample-efficient value-based learning in fine-grained continuous control by performing multi-level action discretization ($L$ levels, $B$ bins per level) and refining selections across levels. The Coarse-to-fine Q-Network (CQN) implements a hierarchical, factorized critic with level-conditioned inputs and a progressive inference procedure that yields high-precision actions while keeping per-level action spaces small. Across 20 sparsely-rewarded RLBench tasks and real-world UR5 manipulation, CQN outperforms standard actor-critic baselines and competitive BC methods, demonstrating rapid online learning with modest demonstrations and no heavy pretraining. These results suggest CRL as a practical, general approach to making value-based methods viable for real-world, high-precision robotic control, with broad potential applicability beyond manipulation tasks.

Abstract

Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present Coarse-to-fine Reinforcement Learning (CRL), a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner, enabling the use of stable, sample-efficient value-based RL algorithms for fine-grained continuous control tasks. Our key idea is to train agents that output actions by iterating the procedure of (i) discretizing the continuous action space into multiple intervals and (ii) selecting the interval with the highest Q-value to further discretize at the next level. We then introduce a concrete, value-based algorithm within the CRL framework called Coarse-to-fine Q-Network (CQN). Our experiments demonstrate that CQN significantly outperforms RL and behavior cloning baselines on 20 sparsely-rewarded RLBench manipulation tasks with a modest number of environment interactions and expert demonstrations. We also show that CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.

Continuous Control with Coarse-to-fine Reinforcement Learning

TL;DR

This work introduces Coarse-to-fine Reinforcement Learning (CRL), a framework that enables stable, sample-efficient value-based learning in fine-grained continuous control by performing multi-level action discretization ( levels, bins per level) and refining selections across levels. The Coarse-to-fine Q-Network (CQN) implements a hierarchical, factorized critic with level-conditioned inputs and a progressive inference procedure that yields high-precision actions while keeping per-level action spaces small. Across 20 sparsely-rewarded RLBench tasks and real-world UR5 manipulation, CQN outperforms standard actor-critic baselines and competitive BC methods, demonstrating rapid online learning with modest demonstrations and no heavy pretraining. These results suggest CRL as a practical, general approach to making value-based methods viable for real-world, high-precision robotic control, with broad potential applicability beyond manipulation tasks.

Abstract

Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present Coarse-to-fine Reinforcement Learning (CRL), a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner, enabling the use of stable, sample-efficient value-based RL algorithms for fine-grained continuous control tasks. Our key idea is to train agents that output actions by iterating the procedure of (i) discretizing the continuous action space into multiple intervals and (ii) selecting the interval with the highest Q-value to further discretize at the next level. We then introduce a concrete, value-based algorithm within the CRL framework called Coarse-to-fine Q-Network (CQN). Our experiments demonstrate that CQN significantly outperforms RL and behavior cloning baselines on 20 sparsely-rewarded RLBench manipulation tasks with a modest number of environment interactions and expert demonstrations. We also show that CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
Paper Structure (74 sections, 3 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 74 sections, 3 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Summary of results. In sparsely-rewarded visual robotic manipulation tasks from RLBench james2020rlbench and real-world environments, CQN learns to solve the tasks with a modest number of environment interactions and expert demonstrations, outperforming baselines such as DrQ-v2 yarats2022mastering, its highly optimized variant DrQ-v2+, and ACT zhao2023learning. Real-world RL videos are available at our webpage.
  • Figure 2: Coarse-to-fine reinforcement learning. (a) We design our RL agent to zoom-into the continuous action space in a coarse-to-fine manner by repeating the procedure of (i) discretizing the continuous action space into multiple intervals and (ii) selecting the interval with the highest Q-value to further discretize at the next level. We then use the centroid of the last level's interval as an action. (b) Our coarse-to-fine critic architecture takes input features along with one-hot level indices and actions from the previous level, and then outputs Q-values for different action dimensions. This design enables the critic to know the current level and which part of the continuous action space to zoom-into.
  • Figure 3: Examples of coarse-to-fine discretization. With a pre-defined number of levels ($L$) and intervals ($B$), e.g.,$L = 3$ and $B = 3$ in this example, we apply discretization to the continuous action space $L$ times with different precisions. We then design our RL agents to learn a critic network with only a few actions at each level, e.g., 3 actions in this example, conditioned on previous level's actions. This enables us to learn discrete policies that can output high-precision actions while avoiding the difficulty of learning the critic network with a large number of discrete actions.
  • Figure 4: Simulation results on 20 sparsely-rewarded tasks from RLBench james2020rlbench. All experiments are initialized with 100 expert demonstrations and all RL methods have an auxiliary BC objective. The solid line and shaded regions represent the mean and confidence intervals, respectively, across 3 runs.
  • Figure 5: Real-world tasks used in our real-world experiments (see \ref{['appendix:experimental_details_real_world']} for more details).
  • ...and 5 more figures