Table of Contents
Fetching ...

SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

Akshat Rana, Peeyush Agarwal, K. P. S. Rana, Amarjit Malhotra

Abstract

Ambiguity poses a major challenge to large language models (LLMs) used as robotic planners. In this letter, we present Scene Graph-Chain-of-Thought (SG-CoT), a two-stage framework where LLMs iteratively query a scene graph representation of the environment to detect and clarify ambiguities. First, a structured scene graph representation of the environment is constructed from input observations, capturing objects, their attributes, and relationships with other objects. Second, the LLM is equipped with retrieval functions to query portions of the scene graph that are relevant to the provided instruction. This grounds the reasoning process of the LLM in the observation, increasing the reliability of robotic planners under ambiguous situations. SG-CoT also allows the LLM to identify the source of ambiguity and pose a relevant disambiguation question to the user or another robot. Extensive experimentation demonstrates that SG-CoT consistently outperforms prior methods, with a minimum of 10% improvement in question accuracy and a minimum success rate increase of 4% in single-agent and 15% in multi-agent environments, validating its effectiveness for more generalizable robot planning.

SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

Abstract

Ambiguity poses a major challenge to large language models (LLMs) used as robotic planners. In this letter, we present Scene Graph-Chain-of-Thought (SG-CoT), a two-stage framework where LLMs iteratively query a scene graph representation of the environment to detect and clarify ambiguities. First, a structured scene graph representation of the environment is constructed from input observations, capturing objects, their attributes, and relationships with other objects. Second, the LLM is equipped with retrieval functions to query portions of the scene graph that are relevant to the provided instruction. This grounds the reasoning process of the LLM in the observation, increasing the reliability of robotic planners under ambiguous situations. SG-CoT also allows the LLM to identify the source of ambiguity and pose a relevant disambiguation question to the user or another robot. Extensive experimentation demonstrates that SG-CoT consistently outperforms prior methods, with a minimum of 10% improvement in question accuracy and a minimum success rate increase of 4% in single-agent and 15% in multi-agent environments, validating its effectiveness for more generalizable robot planning.
Paper Structure (11 sections, 2 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: The figure shows possible configurations of the defined ambiguity types. For each category, the image of the left is the global perspective and the human instruction, and the one on the right is the egocentric view of the robot and its clarification question.
  • Figure 2: SG-CoT. The robot receives an instruction from the user and an observation from the environment. The observation is converted to a scene graph representation. During its reasoning process, the robot queries the scene graph via API calls to fetch information about the environment. After multiple rounds of thinking and retrieval, the LLM either performs a set of actions in the environment, or asks a clarification question from the user.