Table of Contents
Fetching ...

LLM-enhanced Scene Graph Learning for Household Rearrangement

Wenhao Li, Zhiyuan Yu, Qijin She, Zhinan Yu, Yuqing Lan, Chenyang Zhu, Ruizhen Hu, Kai Xu

TL;DR

LLM-enhanced scene graph learning is proposed which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations) and achieves state-of-the-art performance in misplacement detection and the following rearrangement planning.

Abstract

The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.

LLM-enhanced Scene Graph Learning for Household Rearrangement

TL;DR

LLM-enhanced scene graph learning is proposed which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations) and achieves state-of-the-art performance in misplacement detection and the following rearrangement planning.

Abstract

The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.
Paper Structure (46 sections, 2 equations, 18 figures, 4 tables, 2 algorithms)

This paper contains 46 sections, 2 equations, 18 figures, 4 tables, 2 algorithms.

Figures (18)

  • Figure 1: Given a scene graph (SG) with key frames, we utilize Large Language Model (LLM) to perform context-induced affordance analysis for all objects in the scene. These affordances are incorporated into the SG, updating the nodes and edges to construct an Affordance Enhanced Graph (AEG). We then evaluate the appropriateness of the current placements of all carriable objects in the AEG based on the affordance information of the objects and their receptacles, identifying misplaced items. For each misplaced carriable, we rate the suitability of each receptacle in the AEG as a placement target. The top k suitable receptacles are selected as candidates, and their affordances are retrieved as prompts to the LLM to generate the placement decision.
  • Figure 2: Context-induced affordance analysis for objects in the scene graph. (a) For each object, we conduct a local analysis based on its textual contents related to its neighborhood in the graph and its key frame image, storing the results in the nodes as local affordances. (b) We construct object-area-room hierarchies and aggregate the local affordances of all objects within an area. Using LLM, we summarize the content and functionality of the entire area based on the aggregated information, obtaining descriptions for all the areas. (c) We use the descriptions of all areas in a room as the global context and assign them to all nodes within the room to allow the LLM to identify meaningful semantic context information for each object. With this meaningful context, we construct new semantic edges between objects in the same room and update the affordances, completing the context-induced affordance analysis of the scene graph.
  • Figure 3: Local affordance analysis of an example receptacle. LLM analyzes the contextual details from the scene graph and visual image to determine the potential functionality of the receptacle, assigning a more specific category.
  • Figure 4: An example of placement scoring with the LLM. We set a fixed standard in the prompt for the LLM to refer to when rating: 100 for perfect placement, 0 for wrong placement, and 50 for placements difficult to judge.
  • Figure 5: We use a score-based retrieval-augmented method for placement decision generation. First, we construct a context-induced affordances database for all receptacles in the AEG. For each carriable object to be rearranged, we treat the object as a query and evaluate the suitability of each receptacle in the database as a placement target using the LLM scorer. The top k receptacles with the highest scores are selected as candidate receptacles. We retrieve their context-induced affordances from the database and use them along with the query as prompt input to the LLM to generate the best placement plan.
  • ...and 13 more figures