Table of Contents
Fetching ...

Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

Hanjing Shi, Dominic DiFranzo

TL;DR

The paper tackles sustained reliability in long-horizon AI workflows by reframing alignment as a trajectory-level control problem. It introduces APEMO, a runtime, orthogonal overlay that reallocates compute to peak and ending segments under a fixed budget, guided by temporal-affective signals rather than model-weight changes. Through ABM, single-agent LLM, and multi-agent evaluations, it shows consistent improvements in trajectory-level quality and reuse, with quantified trade-offs in coordination cost. The work highlights trajectory-level control as a practical, scalable pathway to robust agentic systems and invites further study on human–AI interaction, adaptive scheduling, and scaling to larger models.

Abstract

Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We introduce APEMO (Affect-aware Peak-End Modulation for Orchestration), a runtime scheduling layer that optimizes computational allocation under fixed budgets by operationalizing temporal-affective signals. Instead of modifying model weights, APEMO detects trajectory instability through behavioral proxies and targets repairs at critical segments, such as peak moments and endings. Evaluation across multi-agent simulations and LLM-based planner--executor flows demonstrates that APEMO consistently enhances trajectory-level quality and reuse probability over structural orchestrators. Our results reframe alignment as a temporal control problem, offering a resilient engineering pathway for the development of long-horizon agentic systems.

Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

TL;DR

The paper tackles sustained reliability in long-horizon AI workflows by reframing alignment as a trajectory-level control problem. It introduces APEMO, a runtime, orthogonal overlay that reallocates compute to peak and ending segments under a fixed budget, guided by temporal-affective signals rather than model-weight changes. Through ABM, single-agent LLM, and multi-agent evaluations, it shows consistent improvements in trajectory-level quality and reuse, with quantified trade-offs in coordination cost. The work highlights trajectory-level control as a practical, scalable pathway to robust agentic systems and invites further study on human–AI interaction, adaptive scheduling, and scaling to larger models.

Abstract

Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We introduce APEMO (Affect-aware Peak-End Modulation for Orchestration), a runtime scheduling layer that optimizes computational allocation under fixed budgets by operationalizing temporal-affective signals. Instead of modifying model weights, APEMO detects trajectory instability through behavioral proxies and targets repairs at critical segments, such as peak moments and endings. Evaluation across multi-agent simulations and LLM-based planner--executor flows demonstrates that APEMO consistently enhances trajectory-level quality and reuse probability over structural orchestrators. Our results reframe alignment as a temporal control problem, offering a resilient engineering pathway for the development of long-horizon agentic systems.
Paper Structure (29 sections, 5 equations, 5 figures, 3 tables)

This paper contains 29 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: APEMO as a temporal-affective overlay for long-horizon workflows. Uniform allocation optimizes mean-step performance but allows negative peaks to persist. APEMO reallocates effort toward peak repair and ending stabilization under fixed compute budgets, improving trajectory-level robustness.
  • Figure 2: APEMO internal control loop. At each turn $t$, a frustration signal $S_f(t)$ is computed from behavioral proxies. If a negative peak is detected, a precision repair module reallocates compute toward peak and ending turns. Coordination cost accumulates under a fixed budget constraint $C \le C_{max}$.
  • Figure 3: Forest-style effect plot for key deltas (APEMO minus baseline) with 95% bootstrap CIs.
  • Figure 4: Trap recovery dynamics in the $n=20$ LLM trap block, derived from episode-level trap drop/rebound metrics. Left: quality change around trap; right: frustration change around trap. Shaded areas are 95% bootstrap CIs.
  • Figure 5: Coordination frontier (quality gain vs cost increase). Each point is one large-$n$ block/comparison.