Table of Contents
Fetching ...

Information-driven Affordance Discovery for Efficient Robotic Manipulation

Pietro Mazzaglia, Taco Cohen, Daniel Dijkman

TL;DR

IDA introduces an information-driven framework for visual affordance discovery by casting the problem as a contextual bandit and using a Jensen-Shannon divergence-based information gain to guide exploration. An ensemble of decoders with a shared encoder yields per-pixel affordance maps from a 2D point-cloud input, enabling data-efficient learning of multiple manipulation primitives such as grasping, stacking, and opening. Across ManiSkill2 simulations and real-world grasping with a UFactory xArm 6, IDA demonstrates superior data efficiency and robustness, with ablations highlighting the benefits of information-driven sampling and the reward term. The work advances practical robotic manipulation by reducing the interaction burden needed to learn useful affordances and suggests avenues for extending to longer-horizon tasks and hierarchical control. All mathematical formulations are wrapped in $...$ to ensure precise notation.

Abstract

Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.

Information-driven Affordance Discovery for Efficient Robotic Manipulation

TL;DR

IDA introduces an information-driven framework for visual affordance discovery by casting the problem as a contextual bandit and using a Jensen-Shannon divergence-based information gain to guide exploration. An ensemble of decoders with a shared encoder yields per-pixel affordance maps from a 2D point-cloud input, enabling data-efficient learning of multiple manipulation primitives such as grasping, stacking, and opening. Across ManiSkill2 simulations and real-world grasping with a UFactory xArm 6, IDA demonstrates superior data efficiency and robustness, with ablations highlighting the benefits of information-driven sampling and the reward term. The work advances practical robotic manipulation by reducing the interaction burden needed to learn useful affordances and suggests avenues for extending to longer-horizon tasks and hierarchical control. All mathematical formulations are wrapped in to ensure precise notation.

Abstract

Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.
Paper Structure (11 sections, 13 equations, 6 figures, 2 tables)

This paper contains 11 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Information-driven affordance discovery. The model processes inputs from the environment (2D point cloud) using a single encoder, concatenates action parameters and decodes visual affordance maps with multiple decoders (ensemble). Averaging the outputs of these networks, we can extract reliable affordance maps, thanks to the ensemble diversity. Computing the information radius, we can obtain information gain maps about affordances in the scene, to drive considerate explorative interactions. Images represent actual model outputs from IDA in the ManiSkill2 Open Drawer environment.
  • Figure 2: ManiSkill2 performance. Affordance success aggregated across ManiSkill2 tasks and runs.
  • Figure 3: Performance over time. The affordance success rate in the evaluation stage increases over the number of interactions, averaged over all tasks (5+ seeds per task).
  • Figure 4: Reward-free ablation. Comparing reward-free affordance discovery methods over time. (5+ seeds).
  • Figure 5: Real-world results and setup. IDA learns to grasp objects faster than other approaches, achieving up to 90% grasping success, on a UFACTORY xArm 6 platform.
  • ...and 1 more figures