Table of Contents
Fetching ...

Generative Interfaces for Language Models

Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, Diyi Yang

TL;DR

The paper tackles the limitation of linear conversational UIs by introducing Generative Interfaces, where LLMs proactively generate adaptive, task-driven user interfaces. It formalizes a two-layer representation—a directed interaction-flow graph $\\mathcal{G}=(\\mathcal{V},\\mathcal{T})$ and per-component finite state machines $\\mathcal{M}=(\\mathcal{S},\\mathcal{E},\\delta,s_0)$—and a generation pipeline that maps queries to requirements, then to structured representations, and finally to executable UI code, followed by iterative refinement with adaptive rewards. An evaluation framework (UIX) combines a diverse prompt suite with multi-dimensional human studies across functional, interactive, and emotional dimensions, including 100 synthetic queries and real-user evaluations. Results show GenUI consistently outperforms traditional conversational interfaces, with significant gains in user preference and perceived usability, credibility, and engagement; ablations confirm the critical role of structure, iteration, and adaptive reward design. The work highlights when generative interfaces are most beneficial and points to practical trade-offs and future improvements, such as broader modality support and real-world deployment considerations.

Abstract

Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human-AI interaction.

Generative Interfaces for Language Models

TL;DR

The paper tackles the limitation of linear conversational UIs by introducing Generative Interfaces, where LLMs proactively generate adaptive, task-driven user interfaces. It formalizes a two-layer representation—a directed interaction-flow graph and per-component finite state machines —and a generation pipeline that maps queries to requirements, then to structured representations, and finally to executable UI code, followed by iterative refinement with adaptive rewards. An evaluation framework (UIX) combines a diverse prompt suite with multi-dimensional human studies across functional, interactive, and emotional dimensions, including 100 synthetic queries and real-user evaluations. Results show GenUI consistently outperforms traditional conversational interfaces, with significant gains in user preference and perceived usability, credibility, and engagement; ablations confirm the critical role of structure, iteration, and adaptive reward design. The work highlights when generative interfaces are most beneficial and points to practical trade-offs and future improvements, such as broader modality support and real-world deployment considerations.

Abstract

Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human-AI interaction.

Paper Structure

This paper contains 25 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Generative Interfaces compared to conversational interfaces. (a) Conceptual framework showing how Generative Interfaces create structured, interactive experiences rather than static text responses, evaluated along functional, interactive, and emotional dimensions. (b–c) Example queries illustrate how Generative Interfaces transform user input into adaptive tools—such as interactive learning aids or multistep workflows—providing clearer organization and richer interactivity than conversational responses.
  • Figure 2: Generative Interfaces infrastructure: (a) User queries are first converted into (b) structured interface-specific representations that model interaction flows and component dependencies. This structured representation guides the generation of (c) functional code and user interfaces. The system employs (d) iterative refinement with (e) adaptive reward functions containing query-specific evaluation rubrics.
  • Figure 3: Human preference across 10 query topics tamkin2024clio.
  • Figure 4: Human evaluation results comparing GenUIs and ConvUIs. (a) User preference breakdown by query type and detail level. (b) Performance improvement across iterative interactions.
  • Figure 5: Human comment distribution. (a) Distribution of high-level concepts extracted from the valid user comments using the pipeline described in Sec. \ref{['sec:human_comments_analysis']}. Comments without clear evaluative content were excluded. (b) For each concept in (a), the chart shows the percentage of users who preferred GenUIs or ConvUIs.
  • ...and 7 more figures