Table of Contents
Fetching ...

Object and Relation Centric Representations for Push Effect Prediction

Ahmet E. Tekden, Aykut Erdem, Erkut Erdem, Tamim Asfour, Emre Ugur

TL;DR

This work tackles push effect prediction in cluttered scenes by leveraging object- and relation-centric graphs to model multi-object dynamics and articulations. It introduces a propagation-based GNN that jointly handles physics prediction and parameter estimation, followed by a belief-regulation module with temporal propagation and weight sharing to refine unknowns as the robot observes the scene. The method demonstrates improved long-horizon prediction over image-based baselines, supports 6D motion prediction in lever-up tasks, and transfers to unseen tools for planning. Validated across simulation and real-world data, the approach advances capable, interpretable reasoning for non-prehensile manipulation and tool use with articulated objects. The work suggests further gains from intelligent exploration and unsupervised object grounding to broaden applicability.

Abstract

Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between prediction and reality. For this reason, effect prediction and parameter estimation with pushing actions have been heavily investigated in the literature. However, current approaches are limited because they either model systems with a fixed number of objects or use image-based representations whose outputs are not very interpretable and quickly accumulate errors. In this paper, we propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions by modeling object relations based on contacts or articulations. Our framework is validated both in real and simulated environments containing different shaped multi-part objects connected via different types of joints and objects with different masses, and it outperforms image-based representations on physics prediction. Our approach enables the robot to predict and adapt the effect of a pushing action as it observes the scene. It can also be used for tool manipulation with never-seen tools. Further, we demonstrate 6D effect prediction in the lever-up action in the context of robot-based hard-disk disassembly.

Object and Relation Centric Representations for Push Effect Prediction

TL;DR

This work tackles push effect prediction in cluttered scenes by leveraging object- and relation-centric graphs to model multi-object dynamics and articulations. It introduces a propagation-based GNN that jointly handles physics prediction and parameter estimation, followed by a belief-regulation module with temporal propagation and weight sharing to refine unknowns as the robot observes the scene. The method demonstrates improved long-horizon prediction over image-based baselines, supports 6D motion prediction in lever-up tasks, and transfers to unseen tools for planning. Validated across simulation and real-world data, the approach advances capable, interpretable reasoning for non-prehensile manipulation and tool use with articulated objects. The work suggests further gains from intelligent exploration and unsupervised object grounding to broaden applicability.

Abstract

Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between prediction and reality. For this reason, effect prediction and parameter estimation with pushing actions have been heavily investigated in the literature. However, current approaches are limited because they either model systems with a fixed number of objects or use image-based representations whose outputs are not very interpretable and quickly accumulate errors. In this paper, we propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions by modeling object relations based on contacts or articulations. Our framework is validated both in real and simulated environments containing different shaped multi-part objects connected via different types of joints and objects with different masses, and it outperforms image-based representations on physics prediction. Our approach enables the robot to predict and adapt the effect of a pushing action as it observes the scene. It can also be used for tool manipulation with never-seen tools. Further, we demonstrate 6D effect prediction in the lever-up action in the context of robot-based hard-disk disassembly.

Paper Structure

This paper contains 20 sections, 3 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: We will normally expect the action of the robot on the left image to scatter contacted objects. However, seeing the contacted objects moving together, the robot should correct its belief to enable this dynamic.
  • Figure 2: Our framework extracts object- and relation-centric latent representations from the current physical scene. The latent representations are initially used to update unknown parameters of the scene graph, then with the planned motor commands, they are used for predicting future motion of the manipulated objects.
  • Figure 3: Comparison between object-centric vs object- and relation-centric representations. The representation on the left allows network to capture object details in a more compositional way, allowing network to propagate action-effects between objects and predicting action-effects of each object more accurately.
  • Figure 4: This illustration shows how the graph of the scene is constructed and how the force emerging from robot end-effector motion is passed to the faraway objects. After graph construction, each node holds state information of their corresponding objects, including the robot. Considering how state information of robot is passed, in the first propagation step, it is passed to nodes of objects that contact the robot end-effector. In the second propagation step, via nodes of objects that the robot initially contact, this state information is passed to nodes of non-contacted objects.
  • Figure 5: Physics prediction results on articulated object environments. Error distribution of our network is skewed toward lower error, while CNN-based architecture that uses object-centric images has error distribution skewed towards higher error.
  • ...and 12 more figures