Generative Interfaces for Language Models
Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, Diyi Yang
TL;DR
The paper tackles the limitation of linear conversational UIs by introducing Generative Interfaces, where LLMs proactively generate adaptive, task-driven user interfaces. It formalizes a two-layer representation—a directed interaction-flow graph $\\mathcal{G}=(\\mathcal{V},\\mathcal{T})$ and per-component finite state machines $\\mathcal{M}=(\\mathcal{S},\\mathcal{E},\\delta,s_0)$—and a generation pipeline that maps queries to requirements, then to structured representations, and finally to executable UI code, followed by iterative refinement with adaptive rewards. An evaluation framework (UIX) combines a diverse prompt suite with multi-dimensional human studies across functional, interactive, and emotional dimensions, including 100 synthetic queries and real-user evaluations. Results show GenUI consistently outperforms traditional conversational interfaces, with significant gains in user preference and perceived usability, credibility, and engagement; ablations confirm the critical role of structure, iteration, and adaptive reward design. The work highlights when generative interfaces are most beneficial and points to practical trade-offs and future improvements, such as broader modality support and real-world deployment considerations.
Abstract
Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human-AI interaction.
