Table of Contents
Fetching ...

Behavior Trees Enable Structured Programming of Language Model Agents

Richard Kelley

TL;DR

Language-model agents are powerful but brittle in real-world deployments; this work proposes behavior trees as a unifying, modular framework to structure and compose language-enabled agents. It introduces Dendron, a Python library that integrates causal and vision-language actions with a blackboard for data sharing, enabling safe, interpretable, and edge-friendly agent architectures. Through three case studies—a chat agent, robot visual inspection, and a safety-focused BT defense against prompt-based attacks—the paper demonstrates modularity, reusability, and practical safety guarantees afforded by behavior-tree orchestration. The findings indicate that structured programming with BTs can harness modern transformers while mitigating hallucinations, multimodal integration challenges, and information leakage, supporting scalable, trustworthy language-model agents in dynamic environments.

Abstract

Language models trained on internet-scale data sets have shown an impressive ability to solve problems in Natural Language Processing and Computer Vision. However, experience is showing that these models are frequently brittle in unexpected ways, and require significant scaffolding to ensure that they operate correctly in the larger systems that comprise "language-model agents." In this paper, we argue that behavior trees provide a unifying framework for combining language models with classical AI and traditional programming. We introduce Dendron, a Python library for programming language model agents using behavior trees. We demonstrate the approach embodied by Dendron in three case studies: building a chat agent, a camera-based infrastructure inspection agent for use on a mobile robot or vehicle, and an agent that has been built to satisfy safety constraints that it did not receive through instruction tuning or RLHF.

Behavior Trees Enable Structured Programming of Language Model Agents

TL;DR

Language-model agents are powerful but brittle in real-world deployments; this work proposes behavior trees as a unifying, modular framework to structure and compose language-enabled agents. It introduces Dendron, a Python library that integrates causal and vision-language actions with a blackboard for data sharing, enabling safe, interpretable, and edge-friendly agent architectures. Through three case studies—a chat agent, robot visual inspection, and a safety-focused BT defense against prompt-based attacks—the paper demonstrates modularity, reusability, and practical safety guarantees afforded by behavior-tree orchestration. The findings indicate that structured programming with BTs can harness modern transformers while mitigating hallucinations, multimodal integration challenges, and information leakage, supporting scalable, trustworthy language-model agents in dynamic environments.

Abstract

Language models trained on internet-scale data sets have shown an impressive ability to solve problems in Natural Language Processing and Computer Vision. However, experience is showing that these models are frequently brittle in unexpected ways, and require significant scaffolding to ensure that they operate correctly in the larger systems that comprise "language-model agents." In this paper, we argue that behavior trees provide a unifying framework for combining language models with classical AI and traditional programming. We introduce Dendron, a Python library for programming language model agents using behavior trees. We demonstrate the approach embodied by Dendron in three case studies: building a chat agent, a camera-based infrastructure inspection agent for use on a mobile robot or vehicle, and an agent that has been built to satisfy safety constraints that it did not receive through instruction tuning or RLHF.
Paper Structure (35 sections, 20 figures, 4 tables)

This paper contains 35 sections, 20 figures, 4 tables.

Figures (20)

  • Figure 1: The two primary control nodes in a standard behavior tree are the sequence node and the fallback node. The sequence node, shown in Figure \ref{['fig:sequence']}, executes its children one at a time from left to right as long as each child returns success, itself returning success if all its children succeed. The fallback node, shown in Figure \ref{['fig:fallback']}, executes its children one at a time from left to right as long as each child returns failure, returning failure if all of its children fail. In diagrams, the sequence node is typically denoted by a "$\rightarrow$" and the fallback node is typically denoted by a "$?$", as in this figure.
  • Figure 2: An example behavior tree for a chat agent. Best viewed on a screen with the ability to zoom.
  • Figure 3: Subtree responsible for generating "thoughts."
  • Figure 4: Subtree responsible for generating speech.
  • Figure 5: Subtree combining the thought sequence and speech sequence.
  • ...and 15 more figures