Table of Contents
Fetching ...

ALAS: Transactional and Dynamic Multi-Agent LLM Planning

Longling Geng, Edward Y. Chang

TL;DR

ALAS addresses core fragilities in large-language-model planning by decoupling planning from verification, grounding checks in a versioned execution log, and enabling localized repairs that preserve ongoing work. It introduces a five-layer architecture and a canonical, engine-agnostic workflow IR that maps to ASL and Argo, paired with a Localized Cascading Repair Protocol to bound disruption. Across Job Shop Scheduling benchmarks with runtime perturbations, ALAS achieves high feasibility, strong efficiency, and significant reductions in token usage, outperforming single-agent baselines and many multi-agent systems. The combination of validator isolation, persistent state, and localized repair demonstrates practical reliability, scalability, and portability for grounded multi-agent LLM planning, with code and seeds to be released.

Abstract

Large language models enable flexible multi-agent planning but remain fragile in practice: verification is often circular, state changes are not tracked for repair, and small faults trigger costly global recomputation. We present ALAS, a stateful, disruption-aware framework that separates planning from non-circular validation, records a versioned execution log for grounded checks and restore points, and performs localized repair that preserves work in progress. The validator operates independently of the planning LLM with fresh, bounded context, avoiding self-check loops and mid-context attrition. The repair protocol edits only the minimal affected region under explicit policies (retry, catch, timeout, backoff, idempotency keys, compensation, loop guards) defined in a canonical workflow IR that maps to Amazon States Language and Argo Workflows. On job-shop scheduling suites (DMU, TA) across five classical benchmarks, ALAS matches or exceeds strong single-LLM and multi-agent baselines, achieving 83.7% success, reducing token usage by 60%, and running 1.82times faster under comparable settings. A minimal reliability study shows that the validator detects injected structural faults with low overhead, and that localized repair contains runtime perturbations with a bounded edit radius and less makespan degradation than global recompute. Results indicate that the combination of validator isolation, versioned execution logs, and localized repair provides measurable efficiency, feasibility, and scalability for multi-agent LLM planning. Code and seeds will be released.

ALAS: Transactional and Dynamic Multi-Agent LLM Planning

TL;DR

ALAS addresses core fragilities in large-language-model planning by decoupling planning from verification, grounding checks in a versioned execution log, and enabling localized repairs that preserve ongoing work. It introduces a five-layer architecture and a canonical, engine-agnostic workflow IR that maps to ASL and Argo, paired with a Localized Cascading Repair Protocol to bound disruption. Across Job Shop Scheduling benchmarks with runtime perturbations, ALAS achieves high feasibility, strong efficiency, and significant reductions in token usage, outperforming single-agent baselines and many multi-agent systems. The combination of validator isolation, persistent state, and localized repair demonstrates practical reliability, scalability, and portability for grounded multi-agent LLM planning, with code and seeds to be released.

Abstract

Large language models enable flexible multi-agent planning but remain fragile in practice: verification is often circular, state changes are not tracked for repair, and small faults trigger costly global recomputation. We present ALAS, a stateful, disruption-aware framework that separates planning from non-circular validation, records a versioned execution log for grounded checks and restore points, and performs localized repair that preserves work in progress. The validator operates independently of the planning LLM with fresh, bounded context, avoiding self-check loops and mid-context attrition. The repair protocol edits only the minimal affected region under explicit policies (retry, catch, timeout, backoff, idempotency keys, compensation, loop guards) defined in a canonical workflow IR that maps to Amazon States Language and Argo Workflows. On job-shop scheduling suites (DMU, TA) across five classical benchmarks, ALAS matches or exceeds strong single-LLM and multi-agent baselines, achieving 83.7% success, reducing token usage by 60%, and running 1.82times faster under comparable settings. A minimal reliability study shows that the validator detects injected structural faults with low overhead, and that localized repair contains runtime perturbations with a bounded edit radius and less makespan degradation than global recompute. Results indicate that the combination of validator isolation, versioned execution logs, and localized repair provides measurable efficiency, feasibility, and scalability for multi-agent LLM planning. Code and seeds will be released.

Paper Structure

This paper contains 204 sections, 3 theorems, 14 equations, 8 figures, 35 tables, 4 algorithms.

Key Result

Lemma 1

For a system with: The worst-case time complexity is:

Figures (8)

  • Figure 1: $\mathsf{ALAS}$ overview. Left shows the architecture layers and build path for reliability and portability. Right shows the operational loop that uses those policies and logs to contain faults and preserve feasibility.
  • Figure 2: LRCP Phase #1 Local Compensation (makespan = 22): (a) Static baseline schedule; (b) $M_1$ failure between $t = 5$–$8$; (c) $M_1$ notifies $M_2$ to delay $J3(2)$; (d) $M_2$ informs $M_0$ to push $J3(3)$ back.
  • Figure 3: LRSP Phase #2 Queue Reordering (makespan = 22): (a) Safe moves: moving last operations down, first operations forward with potential penalty; (b) Resolving remaining operations.
  • Figure 4: Gantt charts of optimized schedules produced by $\mathsf{ALAS}$ for four representative JSSP benchmark instances with varying job and machine counts. These visualizations demonstrate $\mathsf{ALAS}$'s ability to efficiently allocate resources and minimize makespan across different problem scales. The larger instance TA72 (J=100, M=20) is available in the supplementary materials.
  • Figure 5: Error rate for each repair iteration.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 1: Generalized LCRP Complexity
  • proof
  • Corollary 1: Special Cases
  • Definition 1: LCRP-Repair Decision Problem
  • Theorem 1: LCRP is NP-hard (in fact, strongly NP-hard)
  • proof : Proof sketch
  • Remark D.1: Bounded-edit variants remain NP-hard
  • Remark D.2: Relation to the complexity bound