Table of Contents
Fetching ...

ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models

Jackie Junrui Yang, Yingtian Shi, Yuhan Zhang, Karina Li, Daniel Wan Rosli, Anisha Jain, Shuning Zhang, Tianshi Li, James A. Landay, Monica S. Lam

TL;DR

ReactGenie is presented, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease.

Abstract

By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a nontrivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps.

ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models

TL;DR

ReactGenie is presented, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease.

Abstract

By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a nontrivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps.
Paper Structure (56 sections, 1 equation, 8 figures, 2 tables)

This paper contains 56 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: ReactGenie allows developers to easily build multimodal applications by better-separating interfaces (UI components) from computational models (object-oriented state). The demo (two screenshots) shows the user performing a multimodal (speech + touch) command (left screenshot), with the system executing the command by parsing the voice, understanding the reference in touch, and presenting the user with the appropriate UI interface and text feedback (right screenshot). (Left) ReactGenie provides this new yet familiar interface to create a GUI by defining states (data and logic) and UI components (transformation from data to UI representation). (Right) ReactGenie automatically generates a natural semantic parser from developer-defined states and generates input and output UI mappings from developer-defined UI components. ReactGenie can then execute rich multimodal commands by composing the methods and properties of states and presenting the results using existing UI components.
  • Figure 2: ReactGenie's targeted interaction scenarios.
  • Figure 3: A comparison between state code in React-Redux and in ReactGenie. (Left) Part of an example state code in Redux. Data is stored in the state variable, and the state can be mutated by the actions defined. These actions (stored in a Reducer) do not have explicit types, and they directly manipulate the state, so no return values are defined. Note that the return values of case statements in a Reducer indicate the new state variable after the state changes; actions do not have return values. Due to its monolithic design, it is hard to compose functions together to achieve some multimodal actions. (Right) Part of an example state code in ReactGenie. Automatically managed by ReactGenie, the state is composed of all the instantiated objects' DataClasses. Actions in the state code are defined as methods of the class. All the methods have explicit parameter types and return types. These functions can be composed together to achieve multimodal actions.
  • Figure 4: Overview of the ReactGenie system: (Left) Developers write object-oriented state code for programming content and actions and define the UI as cascading components. (Right top) ReactGenie operates at transpilation and initialization time to generate runtime modules. (Right bottom) Developer modules, generated modules, and ReactGenie modules come together to process rich multimodal commands from the user. This workflow is similar to regular GUI development, maximizes code reuse, and allows full control of the app behavior.
  • Figure 5: Example apps built with ReactGenie. Left: ReactGenieFoodOrdering, a food ordering app. Middle: ReactGenieSocial, a social networking app. Right: ReactGenieSign, a business app for distributing and collecting signed NDAs.
  • ...and 3 more figures