Neurosymbolic AI for Enhancing Instructability in Generative AI
Amit Sheth, Vishal Pallagani, Kaushik Roy
TL;DR
Problem: LLM-based instruction following is unreliable for complex, multi-step tasks due to weak grounding and limited generalization. Approach: A neurosymbolic architecture combining a Symbolic Task Planner, Neural Semantic Parser, and Neurosymbolic Executor, augmented by process knowledge graphs. Contributions: Demonstrates improved task decomposition, grounding, and execution reliability, with dynamic adaptation to changing contexts. Impact: Enables more trustworthy, context-aware, and robust AI assistants across domains by providing structured, grounded, and real-time instruction execution.
Abstract
Generative AI, especially via Large Language Models (LLMs), has transformed content creation across text, images, and music, showcasing capabilities in following instructions through prompting, largely facilitated by instruction tuning. Instruction tuning is a supervised fine-tuning method where LLMs are trained on datasets formatted with specific tasks and corresponding instructions. This method systematically enhances the model's ability to comprehend and execute the provided directives. Despite these advancements, LLMs still face challenges in consistently interpreting complex, multi-step instructions and generalizing them to novel tasks, which are essential for broader applicability in real-world scenarios. This article explores why neurosymbolic AI offers a better path to enhance the instructability of LLMs. We explore the use a symbolic task planner to decompose high-level instructions into structured tasks, a neural semantic parser to ground these tasks into executable actions, and a neuro-symbolic executor to implement these actions while dynamically maintaining an explicit representation of state. We also seek to show that neurosymbolic approach enhances the reliability and context-awareness of task execution, enabling LLMs to dynamically interpret and respond to a wider range of instructional contexts with greater precision and flexibility.
