Table of Contents
Fetching ...

Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution

Ziyi Ni, Yifan Li, Daxiang Dong

TL;DR

This work addresses instability and hallucinations in LLM-driven task planning that rely on stepwise code generation (CodeAct). It introduces Tree-of-Code (ToC), a hybrid framework that unifies tree-structured thought exploration with end-to-end code generation and execution, using a BFS-like expansion of code-execution nodes and majority voting to select robust outcomes. By formalizing thought-to-code transformation with an llm-function and treating code as reasoning, ToC achieves improved accuracy and reduced interaction steps on a complex multi-scene benchmark (M3ToolEval) without requiring model fine-tuning. The results demonstrate stronger robustness in complex reasoning tasks and highlight the practical potential of integrating diverse LLMs through structured reflection and execution feedback, with future work aimed at few-shot supervised fine-tuning and real-world deployment.

Abstract

The exceptional capabilities of large language models (LLMs) have substantially accelerated the rapid rise and widespread adoption of agents. Recent studies have demonstrated that generating Python code to consolidate LLM-based agents' actions into a unified action space (CodeAct) is a promising approach for developing real-world LLM agents. However, this step-by-step code generation approach often lacks consistency and robustness, leading to instability in agent applications, particularly for complex reasoning and out-of-domain tasks. In this paper, we propose a novel approach called Tree-of-Code (ToC) to tackle the challenges of complex problem planning and execution with an end-to-end mechanism. By integrating key ideas from both Tree-of-Thought and CodeAct, ToC combines their strengths to enhance solution exploration. In our framework, each final code execution result is treated as a node in the decision tree, with a breadth-first search strategy employed to explore potential solutions. The final outcome is determined through a voting mechanism based on the outputs of the nodes.

Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution

TL;DR

This work addresses instability and hallucinations in LLM-driven task planning that rely on stepwise code generation (CodeAct). It introduces Tree-of-Code (ToC), a hybrid framework that unifies tree-structured thought exploration with end-to-end code generation and execution, using a BFS-like expansion of code-execution nodes and majority voting to select robust outcomes. By formalizing thought-to-code transformation with an llm-function and treating code as reasoning, ToC achieves improved accuracy and reduced interaction steps on a complex multi-scene benchmark (M3ToolEval) without requiring model fine-tuning. The results demonstrate stronger robustness in complex reasoning tasks and highlight the practical potential of integrating diverse LLMs through structured reflection and execution feedback, with future work aimed at few-shot supervised fine-tuning and real-world deployment.

Abstract

The exceptional capabilities of large language models (LLMs) have substantially accelerated the rapid rise and widespread adoption of agents. Recent studies have demonstrated that generating Python code to consolidate LLM-based agents' actions into a unified action space (CodeAct) is a promising approach for developing real-world LLM agents. However, this step-by-step code generation approach often lacks consistency and robustness, leading to instability in agent applications, particularly for complex reasoning and out-of-domain tasks. In this paper, we propose a novel approach called Tree-of-Code (ToC) to tackle the challenges of complex problem planning and execution with an end-to-end mechanism. By integrating key ideas from both Tree-of-Thought and CodeAct, ToC combines their strengths to enhance solution exploration. In our framework, each final code execution result is treated as a node in the decision tree, with a breadth-first search strategy employed to explore potential solutions. The final outcome is determined through a voting mechanism based on the outputs of the nodes.

Paper Structure

This paper contains 9 sections, 1 equation, 1 figure, 1 table.

Figures (1)

  • Figure 1: An Overview of our method ToC and CodAct comparisons. (a) CodeAct receives input and performs a cycle of execution and correction, but the process is carried out in an iterative, round-by-round manner. (b) ToC applies execution-level reflection in the decision-tree structure. At each layer, different nodes are executed in parallel; if executed correctly, they are stored in the candidate pool for voting, if a node fails, it requires further reflection. Yellow blocks mean continued reflection. Both red and green blocks are done: red blocks are discarded by LLM voting, while green blocks are collected and accepted.