Table of Contents
Fetching ...

Physical Reasoning and Object Planning for Household Embodied Agents

Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, Dianbo Liu

TL;DR

The CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios, is introduced, which not only advances the understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

Abstract

In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments. Drawing inspiration from factors affecting human decision-making, we explore how large language models tackle this challenge through four meticulously crafted commonsense question-and-answer datasets featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights, to simulate diverse household scenarios. Our contributions include insightful human preference mappings for all three factors and four extensive QA datasets (2K, 15k, 60k, 70K questions) probing the intricacies of utility dependencies, contextual dependencies and object physical states. The datasets, along with our findings, are accessible at: https://github.com/Ayush8120/COAT. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

Physical Reasoning and Object Planning for Household Embodied Agents

TL;DR

The CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios, is introduced, which not only advances the understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

Abstract

In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments. Drawing inspiration from factors affecting human decision-making, we explore how large language models tackle this challenge through four meticulously crafted commonsense question-and-answer datasets featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights, to simulate diverse household scenarios. Our contributions include insightful human preference mappings for all three factors and four extensive QA datasets (2K, 15k, 60k, 70K questions) probing the intricacies of utility dependencies, contextual dependencies and object physical states. The datasets, along with our findings, are accessible at: https://github.com/Ayush8120/COAT. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.
Paper Structure (59 sections, 15 figures, 13 tables)

This paper contains 59 sections, 15 figures, 13 tables.

Figures (15)

  • Figure 1: We divide the whole decision-making process into 2 broad phases. Pruning out options firstly based on Object Level then Physical State. Within the Object level, we further divide into 2 sub-steps: Utility and Contextual Appropriateness. We highlight this method's adeptness in comparing appropriateness across an array of factors and coming up with a substitute object even in the absence of the ideal object [Cake Knife]. Our work provides QA datasets about this type of commonsense reasoning
  • Figure 2: Average Accuracy of various models on Task 0 as we increase option count
  • Figure 3: Comparative plot showcasing the variations in Task:1 performances as we keep increasing the object diversity in options from left to right.
  • Figure 4: Model accuracy when evaluated on Task-0
  • Figure 4: Comparative plot showcasing the variations in Task:2 performances as we keep increasing the Count of Bad Configurations in Options from left to right.
  • ...and 10 more figures