Table of Contents
Fetching ...

LightPlanner: Unleashing the Reasoning Capabilities of Lightweight Large Language Models in Task Planning

Weijie Zhou, Manli Tao, Chaoyang Zhao, Honghui Dong, Ming Tang, Jinqiao Wang

TL;DR

LightPlanner tackles the bottleneck of using lightweight LLMs for complex robotic task planning by introducing dynamic parameterized skill control, a three-level hierarchical reasoning framework, and a memory module. It is trained on the LightPlan-40k dataset to yield LightPlanner-1.5B, which achieves高 task success rates with far fewer parameters than large LLMs and excels in spatial semantic reasoning (e.g., determining the largest block). The approach demonstrates strong real-world performance on edge devices and in generalization to unseen tasks, outperforming template-based and reasoning-only baselines. The work offers practical implications for edge robotics, enabling capable, resource-efficient autonomous planning in dynamic environments, with future directions including reinforcement learning and multi-agent coordination.

Abstract

In recent years, lightweight large language models (LLMs) have garnered significant attention in the robotics field due to their low computational resource requirements and suitability for edge deployment. However, in task planning -- particularly for complex tasks that involve dynamic semantic logic reasoning -- lightweight LLMs have underperformed. To address this limitation, we propose a novel task planner, LightPlanner, which enhances the performance of lightweight LLMs in complex task planning by fully leveraging their reasoning capabilities. Unlike conventional planners that use fixed skill templates, LightPlanner controls robot actions via parameterized function calls, dynamically generating parameter values. This approach allows for fine-grained skill control and improves task planning success rates in complex scenarios. Furthermore, we introduce hierarchical deep reasoning. Before generating each action decision step, LightPlanner thoroughly considers three levels: action execution (feedback verification), semantic parsing (goal consistency verification), and parameter generation (parameter validity verification). This ensures the correctness of subsequent action controls. Additionally, we incorporate a memory module to store historical actions, thereby reducing context length and enhancing planning efficiency for long-term tasks. We train the LightPlanner-1.5B model on our LightPlan-40k dataset, which comprises 40,000 action controls across tasks with 2 to 13 action steps. Experiments demonstrate that our model achieves the highest task success rate despite having the smallest number of parameters. In tasks involving spatial semantic reasoning, the success rate exceeds that of ReAct by 14.9 percent. Moreover, we demonstrate LightPlanner's potential to operate on edge devices.

LightPlanner: Unleashing the Reasoning Capabilities of Lightweight Large Language Models in Task Planning

TL;DR

LightPlanner tackles the bottleneck of using lightweight LLMs for complex robotic task planning by introducing dynamic parameterized skill control, a three-level hierarchical reasoning framework, and a memory module. It is trained on the LightPlan-40k dataset to yield LightPlanner-1.5B, which achieves高 task success rates with far fewer parameters than large LLMs and excels in spatial semantic reasoning (e.g., determining the largest block). The approach demonstrates strong real-world performance on edge devices and in generalization to unseen tasks, outperforming template-based and reasoning-only baselines. The work offers practical implications for edge robotics, enabling capable, resource-efficient autonomous planning in dynamic environments, with future directions including reinforcement learning and multi-agent coordination.

Abstract

In recent years, lightweight large language models (LLMs) have garnered significant attention in the robotics field due to their low computational resource requirements and suitability for edge deployment. However, in task planning -- particularly for complex tasks that involve dynamic semantic logic reasoning -- lightweight LLMs have underperformed. To address this limitation, we propose a novel task planner, LightPlanner, which enhances the performance of lightweight LLMs in complex task planning by fully leveraging their reasoning capabilities. Unlike conventional planners that use fixed skill templates, LightPlanner controls robot actions via parameterized function calls, dynamically generating parameter values. This approach allows for fine-grained skill control and improves task planning success rates in complex scenarios. Furthermore, we introduce hierarchical deep reasoning. Before generating each action decision step, LightPlanner thoroughly considers three levels: action execution (feedback verification), semantic parsing (goal consistency verification), and parameter generation (parameter validity verification). This ensures the correctness of subsequent action controls. Additionally, we incorporate a memory module to store historical actions, thereby reducing context length and enhancing planning efficiency for long-term tasks. We train the LightPlanner-1.5B model on our LightPlan-40k dataset, which comprises 40,000 action controls across tasks with 2 to 13 action steps. Experiments demonstrate that our model achieves the highest task success rate despite having the smallest number of parameters. In tasks involving spatial semantic reasoning, the success rate exceeds that of ReAct by 14.9 percent. Moreover, we demonstrate LightPlanner's potential to operate on edge devices.

Paper Structure

This paper contains 16 sections, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Mainstream LLM planning methods (left) rely on predefined templates that struggle to flexibly interpret semantics such as "largest." In contrast, LightPlanner (right) enables LLMs to actively generate dynamic parameters and skill controls. The key innovation lies in decomposing high-level instructions into parameterized skill chains (detect $\rightarrow$ reason $\rightarrow$ pick), where the LLM proactively parses the "largest" attribute through bounding box area calculations, ultimately enabling precise grasping.
  • Figure 2: Architecture of LightPlanner: Generate Hierarchical Deep Reasoning and Dynamic Skill Control.
  • Figure 3: A Complete Example. LightPlanner's efficient performance on Jetson Xavier Orin.
  • Figure 4: Average reasoning latency and VRAM consumption per round of a skill for tasks ranging from simple to complex. Here, L denotes the length of the task's action chain.