Table of Contents
Fetching ...

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Syed Mekael Wasti, Ken Q. Pu, Ali Neshati

TL;DR

This paper introduces the LM-UI framework to convert static user interfaces into dynamic, cognitively aware systems by modeling UI components with textual annotations and guiding user prompts through an agent-based backend. It presents a two-component architecture combining an annotation-tree of UI elements with a multimodal extraction engine to classify inputs and extract actionable parameters in real time, using a Redux-based central store for stateful UI updates. The approach enables natural-language or speech-based control, supported by a pipeline that maps inputs to applications and components, extracts needed parameters, and executes actions via front-end methods. Evaluations compare GPT-baseline prompts against a custom-trained, T5-based engine, showing strong performance in classification and parameter extraction and highlighting areas where further data and model refinement are beneficial for robust, scalable integration.

Abstract

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

TL;DR

This paper introduces the LM-UI framework to convert static user interfaces into dynamic, cognitively aware systems by modeling UI components with textual annotations and guiding user prompts through an agent-based backend. It presents a two-component architecture combining an annotation-tree of UI elements with a multimodal extraction engine to classify inputs and extract actionable parameters in real time, using a Redux-based central store for stateful UI updates. The approach enables natural-language or speech-based control, supported by a pipeline that maps inputs to applications and components, extracts needed parameters, and executes actions via front-end methods. Evaluations compare GPT-baseline prompts against a custom-trained, T5-based engine, showing strong performance in classification and parameter extraction and highlighting areas where further data and model refinement are beneficial for robust, scalable integration.

Abstract

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.
Paper Structure (29 sections, 7 equations, 6 figures, 2 tables)

This paper contains 29 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Visual representation of tree data structure used to store application meta descriptions
  • Figure 2: Two-Component Framework
  • Figure 3: Application State Flow
  • Figure 4: Each application is semantically mapped into the tree structure. Each node holds detailed annotations
  • Figure 5: Entity Extraction & Paramaterization
  • ...and 1 more figures