Table of Contents
Fetching ...

Magentic-UI: Towards Human-in-the-loop Agentic Systems

Hussein Mozannar, Gagan Bansal, Cheng Tan, Adam Fourney, Victor Dibia, Jingya Chen, Jack Gerrits, Tyler Payne, Matheus Kunzler Maldaner, Madeleine Grunde-McLaughlin, Eric Zhu, Griffin Bassman, Jacob Alber, Peter Chang, Ricky Loynd, Friederike Niedtner, Ece Kamar, Maya Murad, Rafah Hosn, Saleema Amershi

TL;DR

<3-5 sentence high-level summary> Magentic-UI addresses the safety and productivity gaps of autonomous LLM agents by providing an open-source, human-in-the-loop interface with a flexible multi-agent architecture. It introduces six interaction mechanisms—co-planning, co-tasking, action guards, verification, memory, and multi-tasking—and demonstrates how these enable safe, low-cost collaboration between humans and agents across web browsing, coding, and file tasks. Through autonomous, simulated-user, and qualitative evaluations on agentic benchmarks, it shows potential to improve task success and user oversight while highlighting remaining challenges in latency, plan predictability, and safety. The work offers a practical platform for researching, comparing, and extending human-agent collaboration strategies in realistic computer-use workflows.</paper_summary>

Abstract

AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with diverse tools via Model Context Protocol (MCP). Moreover, Magentic-UI presents six interaction mechanisms for enabling effective, low-cost human involvement: co-planning, co-tasking, multi-tasking, action guards, and long-term memory. We evaluate Magentic-UI across four dimensions: autonomous task completion on agentic benchmarks, simulated user testing of its interaction capabilities, qualitative studies with real users, and targeted safety assessments. Our findings highlight Magentic-UI's potential to advance safe and efficient human-agent collaboration.

Magentic-UI: Towards Human-in-the-loop Agentic Systems

TL;DR

<3-5 sentence high-level summary> Magentic-UI addresses the safety and productivity gaps of autonomous LLM agents by providing an open-source, human-in-the-loop interface with a flexible multi-agent architecture. It introduces six interaction mechanisms—co-planning, co-tasking, action guards, verification, memory, and multi-tasking—and demonstrates how these enable safe, low-cost collaboration between humans and agents across web browsing, coding, and file tasks. Through autonomous, simulated-user, and qualitative evaluations on agentic benchmarks, it shows potential to improve task success and user oversight while highlighting remaining challenges in latency, plan predictability, and safety. The work offers a practical platform for researching, comparing, and extending human-agent collaboration strategies in realistic computer-use workflows.</paper_summary>

Abstract

AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with diverse tools via Model Context Protocol (MCP). Moreover, Magentic-UI presents six interaction mechanisms for enabling effective, low-cost human involvement: co-planning, co-tasking, multi-tasking, action guards, and long-term memory. We evaluate Magentic-UI across four dimensions: autonomous task completion on agentic benchmarks, simulated user testing of its interaction capabilities, qualitative studies with real users, and targeted safety assessments. Our findings highlight Magentic-UI's potential to advance safe and efficient human-agent collaboration.

Paper Structure

This paper contains 78 sections, 2 equations, 15 figures, 1 table, 1 algorithm.

Figures (15)

  • Figure 1: Magentic-UI is an open-source research prototype of a human-centered agent that is meant to help researchers study open questions on human-in-the-loop approaches and oversight mechanisms for AI agents.
  • Figure 2: The Magentic-UI interface displaying a task in progress being completed. The interface is split as follows: the left side panel is the session selector which allows users to create and monitor multiple sessions, then on the right we have the active session being displayed split in two halves: the left half shows the agent updates in text as well as the input box for the user and the right half shows the browser being controlled by the agent.
  • Figure 3: The plan editor component in Magentic-UI showing the generated plan in response to a user request. The user can directly edit the plan or type in the input box to modify it and then press "Accept Plan" to start execution.
  • Figure 4: Screenshots of the Magentic-UI interface showing: (a) the user interrupting the system to act on the browser with the UI informing the user that they are in control and to notify the agent of any changes, (b) Magentic-UI interrupting the user to ask a clarifying question and (c) the final answer produced by the system.
  • Figure 5: The saved plans view in Magentic-UI showing the users' plans that they learned, created, or imported. Users can edit each plan entry or rerun the task.
  • ...and 10 more figures