Table of Contents
Fetching ...

ALLOY: Generating Reusable Agent Workflows from User Demonstration

Jiawen Li, Zheng Ning, Yuan Tian, Toby Jia-jun Li

TL;DR

Alloy tackles the limitations of prompt-driven LLM web agents by enabling users to demonstrate their procedural knowledge directly in a browser, which is transformed into editable, graph-structured workflows. These task-level workflows, executed by a network of LLM-powered subtasks, are visualized for transparency and can be saved and generalized to related tasks via natural-language prompts. In a within-subjects study with 12 participants, Alloy outperformed prompt-based and manual baselines in capturing user intent, reducing cognitive load, and enabling generalization, particularly for more complex tasks. The findings suggest demonstration-based interaction complements traditional prompts and points toward personalized, evolvable agent systems guided by user demonstrations.

Abstract

Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.

ALLOY: Generating Reusable Agent Workflows from User Demonstration

TL;DR

Alloy tackles the limitations of prompt-driven LLM web agents by enabling users to demonstrate their procedural knowledge directly in a browser, which is transformed into editable, graph-structured workflows. These task-level workflows, executed by a network of LLM-powered subtasks, are visualized for transparency and can be saved and generalized to related tasks via natural-language prompts. In a within-subjects study with 12 participants, Alloy outperformed prompt-based and manual baselines in capturing user intent, reducing cognitive load, and enabling generalization, particularly for more complex tasks. The findings suggest demonstration-based interaction complements traditional prompts and points toward personalized, evolvable agent systems guided by user demonstrations.

Abstract

Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.

Paper Structure

This paper contains 57 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: System overview of Alloy. User demonstrations (e.g. looking for tourist attractions, restaurants, and flight tickets) are collected as input for the multi-agent system, which dynamically generates task workflows that can be reused and generalized through natural language instruction.
  • Figure 2: UI overview of Alloy. Users demonstrate a task by performing it directly in the browser (A1-A2), from which Alloy automatically generates a structured workflow (B1). Users can refine the workflow through visual editing (B2), execute it via the control panel (B3), and review execution outcomes (B4). To adapt the workflow to new scenarios, users provide natural language instructions (C1), then execute the generalized workflow (C2) with real-time monitoring (C3) to obtain new results (C4). The status panel (C5) provides continuous feedback on system state throughout the interaction.
  • Figure 3: Alloy's technical pipeline. Alloy collects user demonstration as inputs for a multi-agent system to generate the corresponding workflow. The workflow can be further adapted according to natural language prompt via a two-agent pipeline.
  • Figure 4: Overall comparison of user ratings of NASA-TLX survey for Alloy under three conditions.
  • Figure 5: Comparison of user ratings of NASA-TLX survey for Alloy under three conditions for medium-hard tasks (Task 2 and Task 3).
  • ...and 2 more figures