Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

Tim Seyde; Peter Werner; Wilko Schwarting; Markus Wulfmeier; Daniela Rus

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus

TL;DR

This paper tackles continuous control where smooth, low-wear actions are desirable but exploration benefits from coarse discretization. It introduces Growing Q-Networks (GQN), a critic-only, decoupled Q-learning approach that adaptively increases action resolution during training through action masking and a shared network torso. By modifying the TD target to operate over an expanding active subspace and providing linear and adaptive growth schedules, GQN achieves strong results on demanding benchmarks, often outperforming stationary discrete baselines and matching or surpassing some continuous actor-critic methods, even under action penalties. The work demonstrates that adaptive control resolution offers a practical, scalable route to bridging coarse exploration with smooth, high-quality policies in continuous control tasks.

Abstract

Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 7 figures)

This paper contains 12 sections, 4 equations, 7 figures.

Introduction
Related Works
Discretized Control
Scalability
Expanding Action Spaces
Constrained Optimization
Preliminaries
Deep Q-Networks
Decoupled Q-Networks
Growing Q-Networks
Experiments
Conclusion

Figures (7)

Figure 1: Schematic of a GQN agent with decoupled $5$-bin discretization and $3$-bin active subspace. The available actions are highlighted in green while the masked actions are depicted in gray. The predicted state-action values $Q(\bm{s}, a^0, ..., a^M)$ are computed via linear composition of the univariate utilities $Q(\bm{s}, a^j)$ by selecting one action per dimension (red).
Figure 2: State-action values for a pendulum swing-up task over the course of training (top to bottom). The active bins are outlined in green. The value predictions transition from random at initialization to structured upon activation. Inactive bins profit from the emergent structure within the shared network torso to warm-start their optimization.
Figure 3: Performance on tasks from the DeepMind Control Suite with action penalty $-0.1|a|^2$. Our GQN agent grows its action space from a $2$ bin to a $9$ bin discretization, where the linear and adaptive expansion schedules yield similar results. The GQN agent performs competitive to the discrete DecQN as well as the continuous D4PG and DMPO baselines, achieving noticeable improvements on the Humanoid Stand and Walk tasks.
Figure 4: Performance on tasks from the DeepMind Control Suite with action penalty $-0.5|a|^2$. Our GQN agent grows its action space from a $2$ bin to a $9$ bin discretization, where we observe benefits of the adaptive variant over the linear schedule. The GQN agent yields performance improvements over the discrete DecQN as well as the continuous D4PG and DMPO baselines, with particularly strong deltas on the Humanoid and Finger tasks.
Figure 5: Comparison of control smoothness and reward performance, relative to GQN without action penalties. Increasing the action penalty coefficient yields smoother control while only minor impact on the original task performance as measured by unconstrained reward $R$. The discrete GQN further improves upon the continuous D4PG agent.
...and 2 more figures

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

TL;DR

Abstract

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (7)