Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Syed Mekael Wasti; Ken Q. Pu; Ali Neshati

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Syed Mekael Wasti, Ken Q. Pu, Ali Neshati

TL;DR

This paper introduces the LM-UI framework to convert static user interfaces into dynamic, cognitively aware systems by modeling UI components with textual annotations and guiding user prompts through an agent-based backend. It presents a two-component architecture combining an annotation-tree of UI elements with a multimodal extraction engine to classify inputs and extract actionable parameters in real time, using a Redux-based central store for stateful UI updates. The approach enables natural-language or speech-based control, supported by a pipeline that maps inputs to applications and components, extracts needed parameters, and executes actions via front-end methods. Evaluations compare GPT-baseline prompts against a custom-trained, T5-based engine, showing strong performance in classification and parameter extraction and highlighting areas where further data and model refinement are beneficial for robust, scalable integration.

Abstract

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

TL;DR

Abstract

Paper Structure (29 sections, 7 equations, 6 figures, 2 tables)

This paper contains 29 sections, 7 equations, 6 figures, 2 tables.

Introduction
Related Work
Challenges
Integration Complexity
Annotation Data Structure Selection & Traversal
Model Selection & Training
Training Data Generation & Wrangling
Proposed LM-UI Framework Architecture & Design
Describing Applications as Events & States
Central Store & Reducers
Case Study: Application/Task Library
Modelling User Interface Components
Annotation Tree & Meta Descriptions
Parsing & Mapping User Input
Mapping via Classification
...and 14 more sections

Figures (6)

Figure 1: Visual representation of tree data structure used to store application meta descriptions
Figure 2: Two-Component Framework
Figure 3: Application State Flow
Figure 4: Each application is semantically mapped into the tree structure. Each node holds detailed annotations
Figure 5: Entity Extraction & Paramaterization
...and 1 more figures

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

TL;DR

Abstract

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)