ALAS: Transactional and Dynamic Multi-Agent LLM Planning
Longling Geng, Edward Y. Chang
TL;DR
ALAS addresses core fragilities in large-language-model planning by decoupling planning from verification, grounding checks in a versioned execution log, and enabling localized repairs that preserve ongoing work. It introduces a five-layer architecture and a canonical, engine-agnostic workflow IR that maps to ASL and Argo, paired with a Localized Cascading Repair Protocol to bound disruption. Across Job Shop Scheduling benchmarks with runtime perturbations, ALAS achieves high feasibility, strong efficiency, and significant reductions in token usage, outperforming single-agent baselines and many multi-agent systems. The combination of validator isolation, persistent state, and localized repair demonstrates practical reliability, scalability, and portability for grounded multi-agent LLM planning, with code and seeds to be released.
Abstract
Large language models enable flexible multi-agent planning but remain fragile in practice: verification is often circular, state changes are not tracked for repair, and small faults trigger costly global recomputation. We present ALAS, a stateful, disruption-aware framework that separates planning from non-circular validation, records a versioned execution log for grounded checks and restore points, and performs localized repair that preserves work in progress. The validator operates independently of the planning LLM with fresh, bounded context, avoiding self-check loops and mid-context attrition. The repair protocol edits only the minimal affected region under explicit policies (retry, catch, timeout, backoff, idempotency keys, compensation, loop guards) defined in a canonical workflow IR that maps to Amazon States Language and Argo Workflows. On job-shop scheduling suites (DMU, TA) across five classical benchmarks, ALAS matches or exceeds strong single-LLM and multi-agent baselines, achieving 83.7% success, reducing token usage by 60%, and running 1.82times faster under comparable settings. A minimal reliability study shows that the validator detects injected structural faults with low overhead, and that localized repair contains runtime perturbations with a bounded edit radius and less makespan degradation than global recompute. Results indicate that the combination of validator isolation, versioned execution logs, and localized repair provides measurable efficiency, feasibility, and scalability for multi-agent LLM planning. Code and seeds will be released.
