Table of Contents
Fetching ...

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer

Elie Aljalbout, Felix Frank, Maximilian Karl, Patrick van der Smagt

TL;DR

This work investigates how action-space design influences reinforcement learning for robot manipulation and sim-to-real transfer. It evaluates 13 action spaces—ranging from joint torque and impedance-based configurations to Cartesian, velocity-based, and delta-action variants—on two manipulation tasks (reaching and pushing) with a Franka Panda in simulation and on a real robot, using PPO for policy optimization. The study introduces metrics to quantify training efficiency, constraint violations, tracking feasibility, task accuracy, and the sim-to-real gap (OTE) and identifies that velocity-based and high-order-derivative action spaces generally transfer better, with delta-action spaces offering robustness. The findings provide concrete guidelines for action-space design to improve real-world deployment of RL policies in robotics.

Abstract

We study the choice of action space in robot manipulation learning and sim-to-real transfer. We define metrics that assess the performance, and examine the emerging properties in the different action spaces. We train over 250 reinforcement learning~(RL) agents in simulated reaching and pushing tasks, using 13 different control spaces. The choice of spaces spans combinations of common action space design characteristics. We evaluate the training performance in simulation and the transfer to a real-world environment. We identify good and bad characteristics of robotic action spaces and make recommendations for future designs. Our findings have important implications for the design of RL algorithms for robot manipulation tasks, and highlight the need for careful consideration of action spaces when training and transferring RL agents for real-world robotics.

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer

TL;DR

This work investigates how action-space design influences reinforcement learning for robot manipulation and sim-to-real transfer. It evaluates 13 action spaces—ranging from joint torque and impedance-based configurations to Cartesian, velocity-based, and delta-action variants—on two manipulation tasks (reaching and pushing) with a Franka Panda in simulation and on a real robot, using PPO for policy optimization. The study introduces metrics to quantify training efficiency, constraint violations, tracking feasibility, task accuracy, and the sim-to-real gap (OTE) and identifies that velocity-based and high-order-derivative action spaces generally transfer better, with delta-action spaces offering robustness. The findings provide concrete guidelines for action-space design to improve real-world deployment of RL policies in robotics.

Abstract

We study the choice of action space in robot manipulation learning and sim-to-real transfer. We define metrics that assess the performance, and examine the emerging properties in the different action spaces. We train over 250 reinforcement learning~(RL) agents in simulated reaching and pushing tasks, using 13 different control spaces. The choice of spaces spans combinations of common action space design characteristics. We evaluate the training performance in simulation and the transfer to a real-world environment. We identify good and bad characteristics of robotic action spaces and make recommendations for future designs. Our findings have important implications for the design of RL algorithms for robot manipulation tasks, and highlight the need for careful consideration of action spaces when training and transferring RL agents for real-world robotics.
Paper Structure (11 sections, 8 equations, 6 figures, 1 table)

This paper contains 11 sections, 8 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The policy outputs an action $a$, which is then transformed into joint torques $\tau$ using a select controller $f_{as}$. The policy and controller receive feedback from the environment. Each action space is defined by the choice of the controller and the way the action is treated in the controller. The policy runs at a 60 Hz frequency and the controller runs at a frequency of 120 Hz and 1 kHz in the simulation and the real world, respectively.
  • Figure 2: We show our real-world robot setup for the reaching (left) and the pushing (right) tasks.
  • Figure 3: We show the episodic reward (ER) obtained during training in simulation for all the studied action spaces. The first four columns show the learning curves grouped by action spaces, whereas the last column shows aggregated results comparing joint (J) and Cartesian (C) action spaces as well as position based (P) and velocity based (V) action spaces for the pushing task. Delta ($\Delta$) action spaces are labeled as either one-step (OI) or multi-step (MI) target integration. We added joint torque (JT) control to the first column. The shaded region represents the range between the 5th and 95th percentiles.
  • Figure 4: Robustness of delta action spaces to the velocity limit hyperparameter in the pushing task on our real setup. We compare one-step and multi-step integration. The error bars indicate the 5th and 95th percentiles.
  • Figure 5: We compare joint position-based action spaces in the pushing task and show their influence on the normalized tracking error and the resulting effect on success rate in the real robot setup.
  • ...and 1 more figures