Unsupervised Learning of Effective Actions in Robotics
Marko Zaric, Jakob Hollenstein, Justus Piater, Erwan Renaudo
TL;DR
The paper addresses learning effective robot actions by grounding actions in the effects they produce, proposing an unsupervised, effect-centric discretization of a continuous motion space. The method comprises a three-stage pipeline: motion sampling in $\mathcal{M}$ to collect $(m_{t-1}, e_t)$, effect-region clustering to form classes $\mathcal{C}_k$, and action prototype generation using RGNG with per-class prototype counts $ξ_k$ computed from class statistics. In the Up The Stairs environment, the effect-centric discretization outperformed uniformly and randomly discretized schemes in convergence speed and maximum reward for discrete RL, while the continuous-action SAC baseline achieved higher final performance but with substantially more parameters. The work suggests that grounding decision-level actions in physically grounded effects can yield compact, task-agnostic representations suitable for efficient robotics learning.
Abstract
Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.
