Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks
Momina Liaqat Ali, Muhammad Abid
TL;DR
The paper tackles instability and hallucinated behavior in language-guided continuous control by introducing a neuro-symbolic framework that confines high-level reasoning to local LLMs while delegating low-level execution to a neural delta controller. Evaluated across multiple local LLMs (Mistral, Phi, LLaMA-3.2) on planar relational tasks, the approach consistently improves success rates and reduces control steps, achieving up to 8.83x speedups and over 70% step reductions. The method enhances stability and interpretability without reinforcement learning or costly rollouts, and results show robustness to LM quality. This work demonstrates a scalable, reproducible path for integrating language understanding with ongoing control in resource-constrained robotic systems.
Abstract
Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.
