Table of Contents
Fetching ...

Human-Centered LLM-Agent User Interface: A Position Paper

Daniel Chin, Yuxuan Wang, Gus Xia

TL;DR

The paper argues for a human-centered LAUI where the LLM agent actively learns user goals and system capabilities to propose emergent interaction workflows, rather than merely executing user commands. It uses Flute X GPT as a concrete instance, illustrating how an LLM-driven agent, prompt manager, and multimodal hardware/software system can provide real-time feedback and adaptively guide complex tasks. The work surveys related tool-using LLMs, GUI-control LLMs, and user-centric agents, and articulates a layered abstraction model (API, GUI, LAUI) along with emergent workflows and three interface roles. Collectively, it advocates research in designing proactive, personalized LAUIs that empower novices to harness sophisticated, multi-modal systems with minimal prior knowledge, with potential broad impact on human-computer interaction and education technologies.

Abstract

Large Language Model (LLM) -in-the-loop applications have been shown to effectively interpret the human user's commands, make plans, and operate external tools/systems accordingly. Still, the operation scope of the LLM agent is limited to passively following the user, requiring the user to frame his/her needs with regard to the underlying tools/systems. We note that the potential of an LLM-Agent User Interface (LAUI) is much greater. A user mostly ignorant to the underlying tools/systems should be able to work with a LAUI to discover an emergent workflow. Contrary to the conventional way of designing an explorable GUI to teach the user a predefined set of ways to use the system, in the ideal LAUI, the LLM agent is initialized to be proficient with the system, proactively studies the user and his/her needs, and proposes new interaction schemes to the user. To illustrate LAUI, we present Flute X GPT, a concrete example using an LLM agent, a prompt manager, and a flute-tutoring multi-modal software-hardware system to facilitate the complex, real-time user experience of learning to play the flute.

Human-Centered LLM-Agent User Interface: A Position Paper

TL;DR

The paper argues for a human-centered LAUI where the LLM agent actively learns user goals and system capabilities to propose emergent interaction workflows, rather than merely executing user commands. It uses Flute X GPT as a concrete instance, illustrating how an LLM-driven agent, prompt manager, and multimodal hardware/software system can provide real-time feedback and adaptively guide complex tasks. The work surveys related tool-using LLMs, GUI-control LLMs, and user-centric agents, and articulates a layered abstraction model (API, GUI, LAUI) along with emergent workflows and three interface roles. Collectively, it advocates research in designing proactive, personalized LAUIs that empower novices to harness sophisticated, multi-modal systems with minimal prior knowledge, with potential broad impact on human-computer interaction and education technologies.

Abstract

Large Language Model (LLM) -in-the-loop applications have been shown to effectively interpret the human user's commands, make plans, and operate external tools/systems accordingly. Still, the operation scope of the LLM agent is limited to passively following the user, requiring the user to frame his/her needs with regard to the underlying tools/systems. We note that the potential of an LLM-Agent User Interface (LAUI) is much greater. A user mostly ignorant to the underlying tools/systems should be able to work with a LAUI to discover an emergent workflow. Contrary to the conventional way of designing an explorable GUI to teach the user a predefined set of ways to use the system, in the ideal LAUI, the LLM agent is initialized to be proficient with the system, proactively studies the user and his/her needs, and proposes new interaction schemes to the user. To illustrate LAUI, we present Flute X GPT, a concrete example using an LLM agent, a prompt manager, and a flute-tutoring multi-modal software-hardware system to facilitate the complex, real-time user experience of learning to play the flute.
Paper Structure (19 sections, 5 figures, 2 tables)

This paper contains 19 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The LLM agent serves as the interface between the underlying system and the user. The LLM agent together with the system forms the application. Direct communications between the user and the system is available and configured by the LLM agent.
  • Figure 2: Interaction excerpts from the video demos.
  • Figure 3: Flute X GPT with LLM in the loop. Music X Machine is the underlying software-hardware system providing multi-modal interaction with the user. The robot chats with the user and plays the piano according to MIDI control. The rule-based manager plays the agent that chats with the LLM, relaying external events to the LLM and resolving responses from the LLM.
  • Figure 4: Three layers of abstraction on top of the underlying system. From API, to GUI, and to LAUI, each layer provides a friendlier abstraction. Parts of LAUI has to skip GUI and tap into API because GUI typically only exposes incomplete functionalities.
  • Figure 5: The workflow is jointly decided by the user's needs and the system's capabilities. Conventionally, the user has to learn the system to devise workflows. In contrast, LAUI can learn the user and propose workflows.