Information-driven Affordance Discovery for Efficient Robotic Manipulation
Pietro Mazzaglia, Taco Cohen, Daniel Dijkman
TL;DR
IDA introduces an information-driven framework for visual affordance discovery by casting the problem as a contextual bandit and using a Jensen-Shannon divergence-based information gain to guide exploration. An ensemble of decoders with a shared encoder yields per-pixel affordance maps from a 2D point-cloud input, enabling data-efficient learning of multiple manipulation primitives such as grasping, stacking, and opening. Across ManiSkill2 simulations and real-world grasping with a UFactory xArm 6, IDA demonstrates superior data efficiency and robustness, with ablations highlighting the benefits of information-driven sampling and the reward term. The work advances practical robotic manipulation by reducing the interaction burden needed to learn useful affordances and suggests avenues for extending to longer-horizon tasks and hierarchical control. All mathematical formulations are wrapped in $...$ to ensure precise notation.
Abstract
Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.
