Table of Contents
Fetching ...

Select before Act: Spatially Decoupled Action Repetition for Continuous Control

Buqing Nie, Yangqing Fu, Yue Gao

TL;DR

SDAR introduces a spatially decoupled action repetition framework for continuous control, enabling per-dimension act-or-repeat decisions in a two-stage policy. By decoupling selection and action, SDAR achieves flexible repetition strategies that balance persistence with diversity, improving sample efficiency and reducing action fluctuations relative to existing closed-loop and open-loop repetition methods. The approach is validated across classic control, locomotion, and manipulation tasks, showing higher performance and smoother control while maintaining computational practicality via selective sampling. This work advances temporally extended decision-making in RL by tailoring repetition to individual actuators and paves the way for incorporating inter-dimensional correlations in future extensions.

Abstract

Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Different to mainstream RL which makes decisions at individual steps, recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. However, existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. This constraint leads to inflexibility in decisions, which reduces policy agility with inferior effectiveness. In this work, we propose a novel repetition framework called SDAR, which implements Spatially Decoupled Action Repetition through performing closed-loop act-or-repeat selection for each action dimension individually. SDAR achieves more flexible repetition strategies, leading to an improved balance between action persistence and diversity. Compared to existing repetition frameworks, SDAR is more sample efficient with higher policy performance and reduced action fluctuation. Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work.

Select before Act: Spatially Decoupled Action Repetition for Continuous Control

TL;DR

SDAR introduces a spatially decoupled action repetition framework for continuous control, enabling per-dimension act-or-repeat decisions in a two-stage policy. By decoupling selection and action, SDAR achieves flexible repetition strategies that balance persistence with diversity, improving sample efficiency and reducing action fluctuations relative to existing closed-loop and open-loop repetition methods. The approach is validated across classic control, locomotion, and manipulation tasks, showing higher performance and smoother control while maintaining computational practicality via selective sampling. This work advances temporally extended decision-making in RL by tailoring repetition to individual actuators and paves the way for incorporating inter-dimensional correlations in future extensions.

Abstract

Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Different to mainstream RL which makes decisions at individual steps, recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. However, existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. This constraint leads to inflexibility in decisions, which reduces policy agility with inferior effectiveness. In this work, we propose a novel repetition framework called SDAR, which implements Spatially Decoupled Action Repetition through performing closed-loop act-or-repeat selection for each action dimension individually. SDAR achieves more flexible repetition strategies, leading to an improved balance between action persistence and diversity. Compared to existing repetition frameworks, SDAR is more sample efficient with higher policy performance and reduced action fluctuation. Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work.

Paper Structure

This paper contains 30 sections, 12 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Difference between repetition strategies of previous methods (left) and SDAR (right). Our method achieves a more flexible strategy through the spatially decoupled repetition design.
  • Figure 2: The two-stage decision process of SDAR algorithm. In the first stage (gray region), the selection policy $\beta$ makes act-or-repeat decision for each action dimension, determining whether the previous action $a^-$ (yellow blocks) should be repeated. In the second stage (blue region), the action policy $\pi$ generates new actions (green blocks) for dimensions that choose act in the first stage.
  • Figure 3: Learning curves of SDAR (red) in various tasks against baseline methods. Each method is trained with at least 10 random seeds. The lines denote the mean episode return, while shaded regions denote the standard error during training. As shown in the figures, our method generally achieves higher sample efficiency in various tasks compared to previous methods. More learning curves are given in Appendix. \ref{['app:exp_res_learning_curves']}.
  • Figure 4: Visualization of act-or-repeat selections of SDAR and TAAC algorithms in LunarLander and Walker2d tasks. The $x$-axis denotes timesteps, and the $y$-axis denotes different action dimensions. The light blue blocks indicate repetition, while dark blue blocks represent act, i.e. change actions in the corresponding action dimensions.
  • Figure 5: Learning curves of SDAR (red) in additional tasks against baseline methods.

Theorems & Definitions (1)

  • proof