Table of Contents
Fetching ...

Neurosymbolic AI for Enhancing Instructability in Generative AI

Amit Sheth, Vishal Pallagani, Kaushik Roy

TL;DR

Problem: LLM-based instruction following is unreliable for complex, multi-step tasks due to weak grounding and limited generalization. Approach: A neurosymbolic architecture combining a Symbolic Task Planner, Neural Semantic Parser, and Neurosymbolic Executor, augmented by process knowledge graphs. Contributions: Demonstrates improved task decomposition, grounding, and execution reliability, with dynamic adaptation to changing contexts. Impact: Enables more trustworthy, context-aware, and robust AI assistants across domains by providing structured, grounded, and real-time instruction execution.

Abstract

Generative AI, especially via Large Language Models (LLMs), has transformed content creation across text, images, and music, showcasing capabilities in following instructions through prompting, largely facilitated by instruction tuning. Instruction tuning is a supervised fine-tuning method where LLMs are trained on datasets formatted with specific tasks and corresponding instructions. This method systematically enhances the model's ability to comprehend and execute the provided directives. Despite these advancements, LLMs still face challenges in consistently interpreting complex, multi-step instructions and generalizing them to novel tasks, which are essential for broader applicability in real-world scenarios. This article explores why neurosymbolic AI offers a better path to enhance the instructability of LLMs. We explore the use a symbolic task planner to decompose high-level instructions into structured tasks, a neural semantic parser to ground these tasks into executable actions, and a neuro-symbolic executor to implement these actions while dynamically maintaining an explicit representation of state. We also seek to show that neurosymbolic approach enhances the reliability and context-awareness of task execution, enabling LLMs to dynamically interpret and respond to a wider range of instructional contexts with greater precision and flexibility.

Neurosymbolic AI for Enhancing Instructability in Generative AI

TL;DR

Problem: LLM-based instruction following is unreliable for complex, multi-step tasks due to weak grounding and limited generalization. Approach: A neurosymbolic architecture combining a Symbolic Task Planner, Neural Semantic Parser, and Neurosymbolic Executor, augmented by process knowledge graphs. Contributions: Demonstrates improved task decomposition, grounding, and execution reliability, with dynamic adaptation to changing contexts. Impact: Enables more trustworthy, context-aware, and robust AI assistants across domains by providing structured, grounded, and real-time instruction execution.

Abstract

Generative AI, especially via Large Language Models (LLMs), has transformed content creation across text, images, and music, showcasing capabilities in following instructions through prompting, largely facilitated by instruction tuning. Instruction tuning is a supervised fine-tuning method where LLMs are trained on datasets formatted with specific tasks and corresponding instructions. This method systematically enhances the model's ability to comprehend and execute the provided directives. Despite these advancements, LLMs still face challenges in consistently interpreting complex, multi-step instructions and generalizing them to novel tasks, which are essential for broader applicability in real-world scenarios. This article explores why neurosymbolic AI offers a better path to enhance the instructability of LLMs. We explore the use a symbolic task planner to decompose high-level instructions into structured tasks, a neural semantic parser to ground these tasks into executable actions, and a neuro-symbolic executor to implement these actions while dynamically maintaining an explicit representation of state. We also seek to show that neurosymbolic approach enhances the reliability and context-awareness of task execution, enabling LLMs to dynamically interpret and respond to a wider range of instructional contexts with greater precision and flexibility.
Paper Structure (10 sections, 9 figures)

This paper contains 10 sections, 9 figures.

Figures (9)

  • Figure 1: A sample instruction from the TravelPlanner dataset where a complex multi-step instruction from a user is illustrated, which requires decomposing the instruction into executable actions. State-of-the-art LLMs could not handle such complex instructions, with GPT-4 successfully producing a plan that meets all the constraints for only 0.6% instructions, while all other LLMs fail to complete any tasks xie2024travelplanner.
  • Figure 2: Notable difference between finetuning, prompting, and instruction tuning for LLMs.
  • Figure 3: Illustrations of hierarchical task ordering, capturing subtasks and their integration into higher-level tasks within a process knowledge graph. Figure \ref{['fig:first_image']} demonstrates a robotic gripper's fetch object task, while Figure \ref{['fig:second_image']} shows the use of process knowledge to enhance safety in conversational agents.
  • Figure 4: Decomposition of a high-level instruction by the Symbolic Task Planner
  • Figure 5: Processing of one of the decomposed task obtained from Symbolic Task Planner by the Neural Semantic Parser
  • ...and 4 more figures