Table of Contents
Fetching ...

AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement

Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

TL;DR

This work tackles the challenge of performing unseen tasks with limited labeled data by combining LLM-driven generic task decomposition with a domain-specific Knowledge Graph (KG) and human-in-the-loop refinement. The framework uses two RDF-based graphs, G_s (state) and G_k (attributes), to check feasibility of LLM-predicted sub-tasks via SPARQL queries and to refine outputs when mismatches occur, with HITL updates expanding the KG. Experimental results in simulated cooking and cleaning tasks show that merging LLM predictions with KG knowledge and selective human feedback yields substantial performance gains over using LLMs alone or LLM+KG, and supports incremental adaptation to new task classes without heavy tuning. This approach enables faster, more reliable deployment of embodied agents in open-set domains by leveraging complementary strengths of LLMs, structured domain knowledge, and user input, with potential extensions to real robots and broader domains.

Abstract

An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation in the context of cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM. Project website§: https://sssshivvvv.github.io/adaptbot/

AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement

TL;DR

This work tackles the challenge of performing unseen tasks with limited labeled data by combining LLM-driven generic task decomposition with a domain-specific Knowledge Graph (KG) and human-in-the-loop refinement. The framework uses two RDF-based graphs, G_s (state) and G_k (attributes), to check feasibility of LLM-predicted sub-tasks via SPARQL queries and to refine outputs when mismatches occur, with HITL updates expanding the KG. Experimental results in simulated cooking and cleaning tasks show that merging LLM predictions with KG knowledge and selective human feedback yields substantial performance gains over using LLMs alone or LLM+KG, and supports incremental adaptation to new task classes without heavy tuning. This approach enables faster, more reliable deployment of embodied agents in open-set domains by leveraging complementary strengths of LLMs, structured domain knowledge, and user input, with potential extensions to real robots and broader domains.

Abstract

An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation in the context of cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM. Project website§: https://sssshivvvv.github.io/adaptbot/

Paper Structure

This paper contains 16 sections, 1 equation, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: For any given task, an LLM provides a generic sequence of abstract actions that is refined using the domain-specific knowledge in a KG. If the sequence refers to objects, attributes, or actions that cannot be resolved using the KG, or leads to unexpected outcomes, human input helps refine or expand the KG.
  • Figure 2: Framework overview for cooking tasks: (a) Input Chain-of-Thought (COT) prompt contains target dish, available ingredients, and an example of input and output action sequence (for task of making coffee), to obtain an output action sequence; (b) Any mismatch (e.g., in object classes, actions) between LLM output and KG are identified and action sequence is revised if possible; (c) Agent attempts to resolve any remaining errors or unexpected outcomes by re-prompting LLM, with errors that persist being addressed by soliciting human input and updating KG; (iv) Revised/corrected action sequence is executed.
  • Figure 3: Example of a node onion in $\mathbf{G_k}$ (top) and $\mathbf{G_s}$ (bottom).
  • Figure 4: Progress line sakib2024cooking showing use of each ingredient when preparing an omelette.
  • Figure 5: 12 variants of tasks that involve the agent assisting with cleaning different objects and surfaces, or clearing objects to achieve the desired object configuration.