Table of Contents
Fetching ...

Information-driven Affordance Discovery for Efficient Robotic Manipulation

Pietro Mazzaglia, Taco Cohen, Daniel Dijkman

TL;DR

This work addresses the data-inefficiency of learning visual affordances for robotic manipulation by reframing affordance discovery as a contextual bandit and introducing Information-Driven Affordance Discovery (IDA). IDA uses an ensemble of decoders with a shared encoder to output per-pixel affordance probabilities and guides exploration with an information gain term $I(x,a)$ computed as the Jensen–Shannon Divergence across ensemble parameters, combined with reward via a UCB-like strategy. The approach yields higher data efficiency and robust final performance in ManiSkill2 simulation and enables fast, real-world grasping on a UArm 6 with no prior data, highlighting its practical impact for interactive, data-conscious robotics. These contributions advance how perception-guided exploration can accelerate learning of actionable visual affordances while reducing reliance on large annotated or synthetic datasets.

Abstract

Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.

Information-driven Affordance Discovery for Efficient Robotic Manipulation

TL;DR

This work addresses the data-inefficiency of learning visual affordances for robotic manipulation by reframing affordance discovery as a contextual bandit and introducing Information-Driven Affordance Discovery (IDA). IDA uses an ensemble of decoders with a shared encoder to output per-pixel affordance probabilities and guides exploration with an information gain term computed as the Jensen–Shannon Divergence across ensemble parameters, combined with reward via a UCB-like strategy. The approach yields higher data efficiency and robust final performance in ManiSkill2 simulation and enables fast, real-world grasping on a UArm 6 with no prior data, highlighting its practical impact for interactive, data-conscious robotics. These contributions advance how perception-guided exploration can accelerate learning of actionable visual affordances while reducing reliance on large annotated or synthetic datasets.

Abstract

Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.
Paper Structure (11 sections, 13 equations, 6 figures, 2 tables)

This paper contains 11 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Information-driven affordance discovery. The model processes inputs from the environment (2D point cloud) using a single encoder, concatenates action parameters and decodes visual affordance maps with multiple decoders (ensemble). Averaging the outputs of these networks, we can extract reliable affordance maps, thanks to the ensemble diversity. Computing the information radius, we can obtain information gain maps about affordances in the scene, to drive considerate explorative interactions. Images represent actual model outputs from IDA in the ManiSkill2 Open Drawer environment.
  • Figure 2: ManiSkill2 performance. Affordance success aggregated across ManiSkill2 tasks and runs.
  • Figure 3: Performance over time. The affordance success rate in the evaluation stage increases over the number of interactions, averaged over all tasks (5+ seeds per task).
  • Figure 4: Reward-free ablation. Comparing reward-free affordance discovery methods over time. (5+ seeds).
  • Figure 5: Real-world results and setup. IDA learns to grasp objects faster than other approaches, achieving up to 90% grasping success, on a UFACTORY xArm 6 platform.
  • ...and 1 more figures