Table of Contents
Fetching ...

SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation

Guanhua Zhang, Mohamed Ahmed, Zhiming Hu, Andreas Bulling

TL;DR

This work proposes interactive behaviour summarisation as a novel computational task and demonstrates its usefulness for automatically uncovering latent user goals while interacting with graphical user interfaces by introducing SummAct – a novel hierarchical method to summarise low-level input actions into high-level goals to tackle this task.

Abstract

Recent work has highlighted the potential of modelling interactive behaviour analogously to natural language. We propose interactive behaviour summarisation as a novel computational task and demonstrate its usefulness for automatically uncovering latent user intentions while interacting with graphical user interfaces. To tackle this task, we introduce SummAct, a novel hierarchical method to summarise low-level input actions into high-level intentions. SummAct first identifies sub-goals from user actions using a large language model and in-context learning. High-level intentions are then obtained by fine-tuning the model using a novel UI element attention to preserve detailed context information embedded within UI elements during summarisation. Through a series of evaluations, we demonstrate that SummAct significantly outperforms baselines across desktop and mobile interfaces as well as interactive tasks by up to 21.9%. We further show three exciting interactive applications benefited from SummAct: interactive behaviour forecasting, automatic behaviour synonym identification, and language-based behaviour retrieval.

SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation

TL;DR

This work proposes interactive behaviour summarisation as a novel computational task and demonstrates its usefulness for automatically uncovering latent user goals while interacting with graphical user interfaces by introducing SummAct – a novel hierarchical method to summarise low-level input actions into high-level goals to tackle this task.

Abstract

Recent work has highlighted the potential of modelling interactive behaviour analogously to natural language. We propose interactive behaviour summarisation as a novel computational task and demonstrate its usefulness for automatically uncovering latent user intentions while interacting with graphical user interfaces. To tackle this task, we introduce SummAct, a novel hierarchical method to summarise low-level input actions into high-level intentions. SummAct first identifies sub-goals from user actions using a large language model and in-context learning. High-level intentions are then obtained by fine-tuning the model using a novel UI element attention to preserve detailed context information embedded within UI elements during summarisation. Through a series of evaluations, we demonstrate that SummAct significantly outperforms baselines across desktop and mobile interfaces as well as interactive tasks by up to 21.9%. We further show three exciting interactive applications benefited from SummAct: interactive behaviour forecasting, automatic behaviour synonym identification, and language-based behaviour retrieval.

Paper Structure

This paper contains 35 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of SummAct for uncovering user intentions during user interface interactions through interactive behaviour summarisation. SummAct employs a hierarchical process that initially generates sub-goals and produces the overall intention in natural language. The input is a sequence of user actions, including the interacted UI element and the user's operation on this element. SummAct uses in-context learning to infer an arbitrary number of sub-goals using a pretrained, frozen LLM (Step 1) and then fine-tunes the LLM while introducing a UI element attention (Step 2) to keep detailed context embedded in UI element contents, as highlighted in bold. Actions in the same colour are summarised into the same sub-goal and then to a phrase in the overall intention. The summary of the output reflects the latent intentions that underlie these actions.
  • Figure 2: Two examples showing the input user actions, their underlying ground-truth intentions and those summarised by the full version of SummAct and its ablation removing sub-goals.
  • Figure 3: An example of using synonyms to compare UI usability for the task of adding $N$ items into the shopping cart. The Uniqlo website (left) allows users to add multiple items with just three clicks, while the Macy's website (right) requires one click per item, leading to more effort and less usability as $N$ increases.
  • Figure 4: An example of using synonyms to compare UI usability for the intention of Searching for jobs in city A. On the upper interface, the users can finish the task with only three actions; on the lower interface, the users must perform six actions, indicating worse usability.
  • Figure 5: Prompt used to generate sub-goals using in-context learning.
  • ...and 2 more figures