Table of Contents
Fetching ...

DoubleAgents: Interactive Simulations for Alignment in Agentic AI

Tao Long, Xuanming Zhang, Sitong Wang, Zhou Yu, Lydia B Chilton

TL;DR

DoubleAgents addresses the core challenge of aligning agentic AI with user goals by embedding interactive simulation, policy-driven planning, and transparent oversight into the agentic workflow. The system combines a ReAct-based coordination loop, a simulated respondent module, an edge-case detector, and a visualization dashboard to enable safe, iterative alignment before live deployment. Technical and user studies show that simulation helps users calibrate autonomy, craft reusable alignment artifacts (policies, templates, stop hooks), and gradually increase delegation while maintaining control. Deployment studies demonstrate real-world relevance, with organizers recognizing the approach as a practical path to bring agentic AI into complex, high-stakes coordination tasks. Together, the work offers a scalable, human-centered blueprint for aligning agentic AI through ongoing interaction, co-configuration, and explainable decision-making.

Abstract

Agentic workflows promise efficiency, but adoption hinges on whether people can align systems that act on their behalf with their goals, values, and situational expectations. We present DoubleAgents, an agentic planning tool that embeds transparency and control through user intervention, value-reflecting policies, rich state visualizations, and uncertainty flagging for human coordination tasks. A built-in respondent simulation generates realistic scenarios, allowing users to rehearse and refine policies and calibrate their use of agentic behavior before live deployment. We evaluate DoubleAgents in a two-day lab study (n = 10), three deployment studies, and a technical evaluation. Results show that participants initially hesitated to delegate but used simulation to probe system behavior and adjust policies, gradually increasing delegation as agent actions became better aligned with their intentions and context. Deployment results demonstrate DoubleAgents' real-world relevance and usefulness, showing that simulation helps users effectively manage real-world tasks with higher complexity and uncertainty. We contribute interactive simulation as a practical pathway for users to iteratively align and calibrate agentic systems.

DoubleAgents: Interactive Simulations for Alignment in Agentic AI

TL;DR

DoubleAgents addresses the core challenge of aligning agentic AI with user goals by embedding interactive simulation, policy-driven planning, and transparent oversight into the agentic workflow. The system combines a ReAct-based coordination loop, a simulated respondent module, an edge-case detector, and a visualization dashboard to enable safe, iterative alignment before live deployment. Technical and user studies show that simulation helps users calibrate autonomy, craft reusable alignment artifacts (policies, templates, stop hooks), and gradually increase delegation while maintaining control. Deployment studies demonstrate real-world relevance, with organizers recognizing the approach as a practical path to bring agentic AI into complex, high-stakes coordination tasks. Together, the work offers a scalable, human-centered blueprint for aligning agentic AI through ongoing interaction, co-configuration, and explainable decision-making.

Abstract

Agentic workflows promise efficiency, but adoption hinges on whether people can align systems that act on their behalf with their goals, values, and situational expectations. We present DoubleAgents, an agentic planning tool that embeds transparency and control through user intervention, value-reflecting policies, rich state visualizations, and uncertainty flagging for human coordination tasks. A built-in respondent simulation generates realistic scenarios, allowing users to rehearse and refine policies and calibrate their use of agentic behavior before live deployment. We evaluate DoubleAgents in a two-day lab study (n = 10), three deployment studies, and a technical evaluation. Results show that participants initially hesitated to delegate but used simulation to probe system behavior and adjust policies, gradually increasing delegation as agent actions became better aligned with their intentions and context. Deployment results demonstrate DoubleAgents' real-world relevance and usefulness, showing that simulation helps users effectively manage real-world tasks with higher complexity and uncertainty. We contribute interactive simulation as a practical pathway for users to iteratively align and calibrate agentic systems.

Paper Structure

This paper contains 78 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: System diagram of DoubleAgents illustrating a day-by-day ReAct workflow that couples policy-guided planning with human oversight and LLM simulation. Inputs (top left) include the organizer’s goal, seminar slots, speaker personas, and a toolbox of callable functions. The coordination agent (center) iteratively: summarizes state; selects applicable policies; proposes a plan and action that the user can Regenerate or Approve; executes by drafting and sending emails, or waiting for responses. If a reply falls outside policy coverage, issueFlag escalates an edge case to the user for clarification before proceeding. The context management (right) maintains time information, past plans/actions, email logs, user preferences, and persona data, continuously updating to ground subsequent steps. Simulated respondent agents (bottom right) generate realistic, persona-consistent behaviors and replies that drive the loop across days. For a detailed walkthrough, please refer to Section 4.
  • Figure 2: A screenshot of DoubleAgents coordinating the assignment of four speakers to four seminar slots. (A) Policy Panel – Displays coordination policies that reflect user values and preferences. Users can add, edit, or delete policies, which guide the reasoning and planning of the coordination agent. (B) Interactive Chat Interface – Shows the full history of plans and actions, allowing users to monitor planning and execution, confirm or modify agent suggestions, and intervene when necessary. (C) Plan and Action Pop-up – A focused view within the chat history where users can review, edit, regenerate, or approve individual plans and actions. (D) Assignment Tracker – Visualizes tentative speaker-to-slot assignments and indicates each speaker’s availability. Dark purple denotes confirmed availability; light purple signals ambiguous availability requiring user confirmation. (E) Communication History Viewer – Displays the messaging history between the coordination agent and each speaker, organized by date and interaction. (F) Calendar View – Provides temporal context by showing the current simulation and upcoming seminar dates, color-coded for clarity.
  • Figure 3: The landing page to specify the user goals, seminar details, and the persona info for the speakers.
  • Figure 4: Examples for plan and action generation.
  • Figure 5: The example email sent by the simulated respondents and the system message when detecting edge case.
  • ...and 2 more figures