Table of Contents
Fetching ...

Unsupervised Learning of Effective Actions in Robotics

Marko Zaric, Jakob Hollenstein, Justus Piater, Erwan Renaudo

TL;DR

The paper addresses learning effective robot actions by grounding actions in the effects they produce, proposing an unsupervised, effect-centric discretization of a continuous motion space. The method comprises a three-stage pipeline: motion sampling in $\mathcal{M}$ to collect $(m_{t-1}, e_t)$, effect-region clustering to form classes $\mathcal{C}_k$, and action prototype generation using RGNG with per-class prototype counts $ξ_k$ computed from class statistics. In the Up The Stairs environment, the effect-centric discretization outperformed uniformly and randomly discretized schemes in convergence speed and maximum reward for discrete RL, while the continuous-action SAC baseline achieved higher final performance but with substantially more parameters. The work suggests that grounding decision-level actions in physically grounded effects can yield compact, task-agnostic representations suitable for efficient robotics learning.

Abstract

Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.

Unsupervised Learning of Effective Actions in Robotics

TL;DR

The paper addresses learning effective robot actions by grounding actions in the effects they produce, proposing an unsupervised, effect-centric discretization of a continuous motion space. The method comprises a three-stage pipeline: motion sampling in to collect , effect-region clustering to form classes , and action prototype generation using RGNG with per-class prototype counts computed from class statistics. In the Up The Stairs environment, the effect-centric discretization outperformed uniformly and randomly discretized schemes in convergence speed and maximum reward for discrete RL, while the continuous-action SAC baseline achieved higher final performance but with substantially more parameters. The work suggests that grounding decision-level actions in physically grounded effects can yield compact, task-agnostic representations suitable for efficient robotics learning.

Abstract

Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.
Paper Structure (12 sections, 7 equations, 7 figures, 2 algorithms)

This paper contains 12 sections, 7 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 1: The simulated environment: (a) the robot (the turquoise cube) starts facing a stairway consisting of 4 steps in its initial position. It can apply a force in the y-z plane to its center of mass (red: x, green: y, blue: z) (b) the robot after performing a motion, in its final position. The red line is the performed trajectory.
  • Figure 2: The visualization of the k-Means effect region clustering in y and z space in action space shows coherent classes even though the clustering was performed in effect space, which reaffirms these two spaces' correlation. Classes $\mathcal{C}_1$ and $\mathcal{C}_3$ correspond to no strict change in the environment where the robot does not reach the first step. The other regions in the top half represent a one-, two-, three-, or four-step height gain (from left to right).
  • Figure 3: Action prototypes (visualized as regular polygons) found by (a) Effect Region Clustering with RGNG (Robust Growing Neural Gas Algorithm) and variation metric (Equation \ref{['eq:metric']}) for selecting the prototype quantity per class, (b) Effect Region Clustering with five prototypes per cluster RGNG, (c) random prototypes and (d) uniform grid prototypes. Methods (a) and (b) identify key change areas using Effect Region Clustering to generate prototypes across various effect motions. Method (b) generates excessive prototypes in stable areas, undermining the advantage of effect classes. Random and uniform methods yield many irrelevant prototypes. The colors of effect classes serve only as visual cues and do not affect methods (c) and (d).
  • Figure 4: During the "Up The Stairs" task learning, our effect-centric method, adjusted for 2000 samples for prototype generation, achieved faster convergence and higher maximal reward than random or uniform prototype generation methods. The SAC baseline, representing solvability in a continuous action space, has its maximal reward marked by a green dashed line at 15 in the graph (a). Graph (b) compares the learning curves of the SAC baseline, our effect-based discretization with DQN.
  • Figure 5: Figure (a) shows three action prototypes for the action "jump one step". Figure (b) shows three action prototypes for the action "jump two steps".
  • ...and 2 more figures