Table of Contents
Fetching ...

Learning Affordances from Interactive Exploration using an Object-level Map

Paula Wulkop, Halil Umut Özdemir, Antonia Hüfner, Jen Jen Chung, Roland Siegwart, Lionel Ott

TL;DR

This work tackles robot-centric affordance learning in unknown environments by integrating an object-level map into an interactive exploration loop. It combines a reinforcement-learning-driven exploration policy with a TSDF++ object-level map to re-identify object instances and propagate interaction labels across viewpoints, while periodically retraining a U-Net affordance predictor on episode-generated data. A key contribution is the explicit integration of object-level mapping into the exploration loop, which yields higher interaction success rates and faster, more accurate affordance predictions, as evidenced by improved Affordance IoU and object-level accuracy over baselines. The approach offers a data-efficient pathway toward robot-specific affordance understanding in realistic scenes and lays groundwork for real-world transfer and extension to more complex object interactions.

Abstract

Many robotic tasks in real-world environments require physical interactions with an object such as pick up or push. For successful interactions, the robot needs to know the object's affordances, which are defined as the potential actions the robot can perform with the object. In order to learn a robot-specific affordance predictor, we propose an interactive exploration pipeline which allows the robot to collect interaction experiences while exploring an unknown environment. We integrate an object-level map in the exploration pipeline such that the robot can identify different object instances and track objects across diverse viewpoints. This results in denser and more accurate affordance annotations compared to state-of-the-art methods, which do not incorporate a map. We show that our affordance exploration approach makes exploration more efficient and results in more accurate affordance prediction models compared to baseline methods.

Learning Affordances from Interactive Exploration using an Object-level Map

TL;DR

This work tackles robot-centric affordance learning in unknown environments by integrating an object-level map into an interactive exploration loop. It combines a reinforcement-learning-driven exploration policy with a TSDF++ object-level map to re-identify object instances and propagate interaction labels across viewpoints, while periodically retraining a U-Net affordance predictor on episode-generated data. A key contribution is the explicit integration of object-level mapping into the exploration loop, which yields higher interaction success rates and faster, more accurate affordance predictions, as evidenced by improved Affordance IoU and object-level accuracy over baselines. The approach offers a data-efficient pathway toward robot-specific affordance understanding in realistic scenes and lays groundwork for real-world transfer and extension to more complex object interactions.

Abstract

Many robotic tasks in real-world environments require physical interactions with an object such as pick up or push. For successful interactions, the robot needs to know the object's affordances, which are defined as the potential actions the robot can perform with the object. In order to learn a robot-specific affordance predictor, we propose an interactive exploration pipeline which allows the robot to collect interaction experiences while exploring an unknown environment. We integrate an object-level map in the exploration pipeline such that the robot can identify different object instances and track objects across diverse viewpoints. This results in denser and more accurate affordance annotations compared to state-of-the-art methods, which do not incorporate a map. We show that our affordance exploration approach makes exploration more efficient and results in more accurate affordance prediction models compared to baseline methods.
Paper Structure (15 sections, 2 equations, 7 figures, 2 tables)

This paper contains 15 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: An object-level map of a living room scene generated by TSDF++ grinvald2021tsdf++ with the robotic agent from the iTHOR simulation framework kolve2017ai2thor.
  • Figure 2: Overview of our method during training. At each time step, an action is executed and the simulator (blue) outputs if the action was successful, as well as the RGB-D image, ground truth instance segmentation mask, and robot pose. The mapping module (red) updates the map with this data, while the affordance module (yellow) predicts the affordances. The RL exploration policy module (green) estimates the next optimal action using the current state of the object-level map, RGB-D image, and affordance estimation as input. At the end of the episode, the label module (purple) annotates each object instance based on the interaction data and estimations from the current affordance network. Finally, the affordance network is retrained after every episode with the new data.
  • Figure 3: Visual representation of the state space. The elements in the top row do not require a map, while the states in the bottom are obtained through the map.
  • Figure 4: An example image of the annotation approach used by nagarajan2020learning and the No Map + No Seg ablation (middle) compared to our approach which uses object segmentation and is therefore able to annotate the full object (right). Annotation masks are shown in green.
  • Figure 5: The training curves for the pick up affordance show that our approach leads to a higher interaction success rate and a better affordance estimation performance.
  • ...and 2 more figures