Table of Contents
Fetching ...

Leveraging Computation of Expectation Models for Commonsense Affordance Estimation on 3D Scene Graphs

Mario A. V. Saucedo, Nikolaos Stathoulopoulos, Akash Patel, Christoforos Kanellakis, George Nikolakopoulos

TL;DR

The paper tackles the challenge of estimating sub-category level, commonsense affordances for objects in 3D scene graphs to enable human-like task planning in robots. It introduces CECI, a Graph Convolutional Network that learns probabilistic affordances from correlation information across a sparse 3DSG, rather than predicting fixed labels. To train and validate, it builds an HM3D-based dataset by mapping 1659 categories to 45 labels, generating ground-truth affordance distributions via human annotation, and augmenting graphs with incomplete views. The experimental evaluation includes offline training with 45 classes, validation using Wasserstein and energy distances, qualitative correlation analyses, and field tests on a Spot robot in indoor environments, demonstrating practical viability and near-human commonsense affordance estimation. This framework supports proactive, context-aware task planning in robotics and can be extended to multi-agent and broader real-world applications.

Abstract

This article studies the commonsense object affordance concept for enabling close-to-human task planning and task optimization of embodied robotic agents in urban environments. The focus of the object affordance is on reasoning how to effectively identify object's inherent utility during the task execution, which in this work is enabled through the analysis of contextual relations of sparse information of 3D scene graphs. The proposed framework develops a Correlation Information (CECI) model to learn probability distributions using a Graph Convolutional Network, allowing to extract the commonsense affordance for individual members of a semantic class. The overall framework was experimentally validated in a real-world indoor environment, showcasing the ability of the method to level with human commonsense. For a video of the article, showcasing the experimental demonstration, please refer to the following link: https://youtu.be/BDCMVx2GiQE

Leveraging Computation of Expectation Models for Commonsense Affordance Estimation on 3D Scene Graphs

TL;DR

The paper tackles the challenge of estimating sub-category level, commonsense affordances for objects in 3D scene graphs to enable human-like task planning in robots. It introduces CECI, a Graph Convolutional Network that learns probabilistic affordances from correlation information across a sparse 3DSG, rather than predicting fixed labels. To train and validate, it builds an HM3D-based dataset by mapping 1659 categories to 45 labels, generating ground-truth affordance distributions via human annotation, and augmenting graphs with incomplete views. The experimental evaluation includes offline training with 45 classes, validation using Wasserstein and energy distances, qualitative correlation analyses, and field tests on a Spot robot in indoor environments, demonstrating practical viability and near-human commonsense affordance estimation. This framework supports proactive, context-aware task planning in robotics and can be extended to multi-agent and broader real-world applications.

Abstract

This article studies the commonsense object affordance concept for enabling close-to-human task planning and task optimization of embodied robotic agents in urban environments. The focus of the object affordance is on reasoning how to effectively identify object's inherent utility during the task execution, which in this work is enabled through the analysis of contextual relations of sparse information of 3D scene graphs. The proposed framework develops a Correlation Information (CECI) model to learn probability distributions using a Graph Convolutional Network, allowing to extract the commonsense affordance for individual members of a semantic class. The overall framework was experimentally validated in a real-world indoor environment, showcasing the ability of the method to level with human commonsense. For a video of the article, showcasing the experimental demonstration, please refer to the following link: https://youtu.be/BDCMVx2GiQE
Paper Structure (13 sections, 2 equations, 3 figures, 2 tables)

This paper contains 13 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Depiction of the proposed affordance estimation method based on 3D scene graphs, where the environment is first abstracted into a 3D scene graph representing building, rooms and objects, and then input to the CECI model for affordance estimation in order to determine the set of commonsense affordances$\hat{A}_{\mathcal{V}}$ for individual members of a semantic class.
  • Figure 2: Depiction of the computed correlations among the 45 semantic class labels present in the generated dataset and the predicted commonsense affordances for 4 entry-level categories. The correlations of the ground truth data are present at the bottom for comparison.
  • Figure 3: Depiction of the proposed affordance estimation method, where the environment is represented by a 3D scene graph with building, rooms and objects, and used to determine the set of commonsense affordances $\hat{A}$ for the individual member of the same semantic class (i.e. chair).