Table of Contents
Fetching ...

Demystifying Action Space Design for Robotic Manipulation Policies

Yuchun Feng, Jinliang Zheng, Zhihao Wang, Dongxiu Liu, Jianxiong Li, Jiangmiao Pang, Tai Wang, Xianyuan Zhan

TL;DR

A large-scale and systematic empirical study confirming that the action space does have significant and complex impacts on robotic policy learning and suggesting that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.

Abstract

The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.

Demystifying Action Space Design for Robotic Manipulation Policies

TL;DR

A large-scale and systematic empirical study confirming that the action space does have significant and complex impacts on robotic policy learning and suggesting that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.

Abstract

The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.
Paper Structure (42 sections, 2 theorems, 13 equations, 14 figures, 10 tables)

This paper contains 42 sections, 2 theorems, 13 equations, 14 figures, 10 tables.

Key Result

Proposition 4.1

Let $\boldsymbol{\epsilon} \in \mathbb{R}^{k}$ be the prediction noise for a chunk of length $k$, with bounded norm $\|\boldsymbol{\epsilon}\|_2 \le \delta$. The cumulative error in the decoded executable actions, denoted as $\mathbf{e}_{a}$, relates to $\boldsymbol{\epsilon}$ via a linear transform

Figures (14)

  • Figure 1: Overview of our study on action space design. (a) Historical analysis shows the divergent usage of action spaces (Absolute vs. Delta, Joint vs. EEF) in existing literature. (b) Our experimental setup includes an action abstraction taxonomy and a large-scale benchmark on both simulation and real-world platforms. We invest over 13,000 real-world rollouts to quantify the impact of these design choices, revealing significant performance gaps and identifying best practices for robotic manipulation under various scenarios.
  • Figure 1: Quantitative comparison of progress scores and standard errors across embodiments and tasks. The results contrast Regression (ACT) and Flow Matching (DP) under four distinct control interface configurations. Bold and underlined values denote the best and second-best performance for ACT and DP separately.
  • Figure 2: Hierarchy of the action space for robotic manipulation policies and its abstraction taxonomy
  • Figure 2: Hyperparameters for model training.
  • Figure 3: (a) We verified that chunk-wise delta for both EEF and Joint perform better than step-wise delta representations. (b) Grid search over execution horizons across four different action space.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Proposition 4.1: Noise Amplification in Step-wise Integration
  • Proposition 7.1: Noise Amplification in Step-wise Integration
  • proof