Table of Contents
Fetching ...

Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks

Momina Liaqat Ali, Muhammad Abid

TL;DR

The paper tackles instability and hallucinated behavior in language-guided continuous control by introducing a neuro-symbolic framework that confines high-level reasoning to local LLMs while delegating low-level execution to a neural delta controller. Evaluated across multiple local LLMs (Mistral, Phi, LLaMA-3.2) on planar relational tasks, the approach consistently improves success rates and reduces control steps, achieving up to 8.83x speedups and over 70% step reductions. The method enhances stability and interpretability without reinforcement learning or costly rollouts, and results show robustness to LM quality. This work demonstrates a scalable, reproducible path for integrating language understanding with ongoing control in resource-constrained robotic systems.

Abstract

Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.

Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks

TL;DR

The paper tackles instability and hallucinated behavior in language-guided continuous control by introducing a neuro-symbolic framework that confines high-level reasoning to local LLMs while delegating low-level execution to a neural delta controller. Evaluated across multiple local LLMs (Mistral, Phi, LLaMA-3.2) on planar relational tasks, the approach consistently improves success rates and reduces control steps, achieving up to 8.83x speedups and over 70% step reductions. The method enhances stability and interpretability without reinforcement learning or costly rollouts, and results show robustness to LM quality. This work demonstrates a scalable, reproducible path for integrating language understanding with ongoing control in resource-constrained robotic systems.

Abstract

Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.

Paper Structure

This paper contains 53 sections, 38 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: Motivation for neuro-symbolic control. End-to-end LLM-based control directly predicts continuous actions, leading to unstable motion and slow convergence. Our approach decouples symbolic reasoning from continuous execution, combining LLM-based semantic understanding with a neural delta controller for stable and efficient closed-loop control.
  • Figure 2: Overview of the proposed neuro-symbolic control framework. A local large language model performs symbolic reasoning over language instructions and environment state, producing a discrete task label. This symbolic output conditions a neural delta controller that executes bounded continuous actions in a closed-loop environment, enabling stable, efficient, and interpretable control.
  • Figure 3: Success rate aggregated across all language models for each spatial task. The proposed LLM+DL framework consistently outperforms LLM-only and DL-only baselines, demonstrating the effectiveness of neuro-symbolic integration.
  • Figure 4: Total average number of control steps for all language models. Compared to LLM-only control, the LLM+DL framework converges far more quickly, resulting in a step reduction of more than 70% for all jobs.
  • Figure 5: Normalized distance-to-goal over time for the right_of task. LLM+DL exhibits fast, monotonic convergence with low variance, whereas LLM-only control shows slower and less stable behavior.
  • ...and 6 more figures