Table of Contents
Fetching ...

Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments

Jinwei Hu, Yi Dong, Youcheng Sun, Xiaowei Huang

TL;DR

This work tackles the challenge of adapting autonomous agents in safety-critical, dynamic environments without retraining. It introduces TAPA, a training-free framework that uses LLMs to moderate the symbolic action space by operating over interpretable logical primitives and synthesizing modular programs for each primitive. Through a meta-agent, multi-scenario program pools, shadow validation, and a provenance chain stored in a RAG knowledge base, TAPA achieves rapid, interpretable adaptation in cyber defense and swarm formation control, demonstrated by high uptime and robust consensus under adversarial and environmental disturbances. The results suggest a paradigm shift from policy retraining to dynamic action adaptation, enabling scalable reliability and safety in evolving operational contexts.

Abstract

Autonomous agents in safety-critical applications must continuously adapt to dynamic conditions without compromising performance and reliability. This work introduces TAPA (Training-free Adaptation of Programmatic Agents), a novel framework that positions large language models (LLMs) as intelligent moderators of the symbolic action space. Unlike prior programmatic agents typically generate a monolithic policy program or rely on fixed symbolic action sets, TAPA synthesizes and adapts modular programs for individual high-level actions, referred to as logical primitives. By decoupling strategic intent from execution, TAPA enables meta-agents to operate over an abstract, interpretable action space while the LLM dynamically generates, composes, and refines symbolic programs tailored to each primitive. Extensive experiments across cybersecurity and swarm intelligence domains validate TAPA's effectiveness. In autonomous DDoS defense scenarios, TAPA achieves 77.7% network uptime while maintaining near-perfect detection accuracy in unknown dynamic environments. In swarm intelligence formation control under environmental and adversarial disturbances, TAPA consistently preserves consensus at runtime where baseline methods fail. This work promotes a paradigm shift for autonomous system design in evolving environments, from policy adaptation to dynamic action adaptation.

Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments

TL;DR

This work tackles the challenge of adapting autonomous agents in safety-critical, dynamic environments without retraining. It introduces TAPA, a training-free framework that uses LLMs to moderate the symbolic action space by operating over interpretable logical primitives and synthesizing modular programs for each primitive. Through a meta-agent, multi-scenario program pools, shadow validation, and a provenance chain stored in a RAG knowledge base, TAPA achieves rapid, interpretable adaptation in cyber defense and swarm formation control, demonstrated by high uptime and robust consensus under adversarial and environmental disturbances. The results suggest a paradigm shift from policy retraining to dynamic action adaptation, enabling scalable reliability and safety in evolving operational contexts.

Abstract

Autonomous agents in safety-critical applications must continuously adapt to dynamic conditions without compromising performance and reliability. This work introduces TAPA (Training-free Adaptation of Programmatic Agents), a novel framework that positions large language models (LLMs) as intelligent moderators of the symbolic action space. Unlike prior programmatic agents typically generate a monolithic policy program or rely on fixed symbolic action sets, TAPA synthesizes and adapts modular programs for individual high-level actions, referred to as logical primitives. By decoupling strategic intent from execution, TAPA enables meta-agents to operate over an abstract, interpretable action space while the LLM dynamically generates, composes, and refines symbolic programs tailored to each primitive. Extensive experiments across cybersecurity and swarm intelligence domains validate TAPA's effectiveness. In autonomous DDoS defense scenarios, TAPA achieves 77.7% network uptime while maintaining near-perfect detection accuracy in unknown dynamic environments. In swarm intelligence formation control under environmental and adversarial disturbances, TAPA consistently preserves consensus at runtime where baseline methods fail. This work promotes a paradigm shift for autonomous system design in evolving environments, from policy adaptation to dynamic action adaptation.

Paper Structure

This paper contains 26 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Policy‑Level Retraining (left) vs. Action‑Level Synthesis and Adaptation (right)
  • Figure 2: TAPA (Training-free Adaptation of Programmatic Agents) Framework. (a) Design-time workflow. TAPA enables autonomous agents to adapt to evolving environments without retraining through LLM-guided symbolic program synthesis: ➀ Logic primitive design. Define high-level symbolic operations based on expert knowledge as interpretable strategic intent. ➁ Decision agent initialization. A meta-agent is instantiated to select logical primitives based on environmental conditions. ➂ LLM-guided program pool construction. LLM generates diverse symbolic programs across multiple simulated scenarios for each primitive. ➃ Action adaptation and validation. When performance degrades, LLM synthesizes candidate programs for action adaptation and validated them through shadow simulation before replacement. ➄ Provenance chain construction. Execution traces and adaptation experiences are stored in a Retrieval-Augmented Generation (RAG) system for future program synthesis. (b) Deployment-time use case for cyber defense. The TAPA-enabled agent monitors network performance, detects degradation, and retrieves or synthesizes validated programs as adaptive defensive operation in dynamic environments.
  • Figure 3: Provenance chain example for DDoS attack.
  • Figure 4: Temporal evolution of defensive patterns under static vs. adaptive action spaces. Blue regions indicate normal state; red regions indicate attack periods.
  • Figure 5: Formation control performance across changing scenarios. TAPA consistently recovers performance in both (a) adversarial environments with malicious interference and (b) severe weather conditions, maintaining near-baseline formation quality.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Definition 1: Logical Primitive
  • Example 1: Logical Primitives in Cyber Defense
  • Definition 2: Meta-Agent and Policy
  • Example 2: Meta-Agent in Cyber Defense
  • Example 3: Program Pool in Cyber Defense
  • Example 4: Action Adaptation in Cyber Defense
  • Example 5: Provenance chain example for cyber defense