Table of Contents
Fetching ...

Exploring Mobile Touch Interaction with Large Language Models

Tim Zindulka, Jannek Sekowski, Florian Lehmann, Daniel Buschek

TL;DR

This work designs a four-dimensional space for mobile touch interaction with large language models and demonstrates two continuous gestures, spread-to-generate and pinch-to-shorten, to control text generation directly within the editor. A novel visual feedback loop using word bubbles supports latency-prone streaming and enables a closed control loop, improving speed, perceived usability, and reducing cognitive load. In two within-subject experiments, the gestural interface with Bubble feedback outperformed both line-based feedback and a no-feedback baseline, and significantly surpassed a ChatGPT-like chatbot UI in efficiency and user experience. The study shows the feasibility and desirability of gesture-based, continuous interaction with LLMs on mobile devices and establishes a foundation for future gesture-based AI writing tools.

Abstract

Interacting with Large Language Models (LLMs) for text editing on mobile devices currently requires users to break out of their writing environment and switch to a conversational AI interface. In this paper, we propose to control the LLM via touch gestures performed directly on the text. We first chart a design space that covers fundamental touch input and text transformations. In this space, we then concretely explore two control mappings: spread-to-generate and pinch-to-shorten, with visual feedback loops. We evaluate this concept in a user study (N=14) that compares three feedback designs: no visualisation, text length indicator, and length + word indicator. The results demonstrate that touch-based control of LLMs is both feasible and user-friendly, with the length + word indicator proving most effective for managing text generation. This work lays the foundation for further research into gesture-based interaction with LLMs on touch devices.

Exploring Mobile Touch Interaction with Large Language Models

TL;DR

This work designs a four-dimensional space for mobile touch interaction with large language models and demonstrates two continuous gestures, spread-to-generate and pinch-to-shorten, to control text generation directly within the editor. A novel visual feedback loop using word bubbles supports latency-prone streaming and enables a closed control loop, improving speed, perceived usability, and reducing cognitive load. In two within-subject experiments, the gestural interface with Bubble feedback outperformed both line-based feedback and a no-feedback baseline, and significantly surpassed a ChatGPT-like chatbot UI in efficiency and user experience. The study shows the feasibility and desirability of gesture-based, continuous interaction with LLMs on mobile devices and establishes a foundation for future gesture-based AI writing tools.

Abstract

Interacting with Large Language Models (LLMs) for text editing on mobile devices currently requires users to break out of their writing environment and switch to a conversational AI interface. In this paper, we propose to control the LLM via touch gestures performed directly on the text. We first chart a design space that covers fundamental touch input and text transformations. In this space, we then concretely explore two control mappings: spread-to-generate and pinch-to-shorten, with visual feedback loops. We evaluate this concept in a user study (N=14) that compares three feedback designs: no visualisation, text length indicator, and length + word indicator. The results demonstrate that touch-based control of LLMs is both feasible and user-friendly, with the length + word indicator proving most effective for managing text generation. This work lays the foundation for further research into gesture-based interaction with LLMs on touch devices.

Paper Structure

This paper contains 111 sections, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Overview of our design space for mobile touch interaction with generative AI for text, with four dimensions (left to right) and subdimensions (values at vertical lines). Coloured streams indicate the design choices for the two concrete touch gesture controls that we designed, implemented, and evaluated in this paper.
  • Figure 2: Our frontend enables gesture interaction with text on mobile devices. We employ our Bubbles visual feedback design to communicate essential information to the user.
  • Figure 3: Our long-press feature allows users to request synonyms on a word level (a) or tone adjustments on a sentence level (b). When words or sentences are replaced by selecting an alternative, they swap places to allow users to revert their action.
  • Figure 4: Overview of our user study design and procedure.
  • Figure 5: Examples of the UI in Experiment 1: In the Lines condition (left) coloured lines provide visual feedback on the change of text length. The design of the NoVis condition (right) offered no visual feedback beyond the text itself.
  • ...and 10 more figures