Table of Contents
Fetching ...

Sequential Decision-Making for Inline Text Autocomplete

Rohan Chitnis, Shentao Yang, Alborz Geramifard

TL;DR

This work reframes inline text autocomplete as a sequential decision-making problem using reinforcement learning to explicitly account for cognitive load via a reward tied to text-entry speed. While theoretical analysis identifies conditions where farsighted policies can surpass myopic ones, empirical results with idealized users show limited speed gains over a fixed-threshold baseline, and a real-user study emphasizes correctness-driven cognitive load over suggestion length. The findings suggest that RL-based improvements may require more realistic user behavior (stochastic acceptance, typos, semantic matching) and a broader objective beyond raw speed, such as user satisfaction and convenience. The work lays a framework for future exploration of RL in inline autocompletion with realistic user modeling and richer reward signals that reflect actual user experience.

Abstract

Autocomplete suggestions are fundamental to modern text entry systems, with applications in domains such as messaging and email composition. Typically, autocomplete suggestions are generated from a language model with a confidence threshold. However, this threshold does not directly take into account the cognitive load imposed on the user by surfacing suggestions, such as the effort to switch contexts from typing to reading the suggestion, and the time to decide whether to accept the suggestion. In this paper, we study the problem of improving inline autocomplete suggestions in text entry systems via a sequential decision-making formulation, and use reinforcement learning to learn suggestion policies through repeated interactions with a target user over time. This formulation allows us to factor cognitive load into the objective of training an autocomplete model, through a reward function based on text entry speed. We acquired theoretical and experimental evidence that, under certain objectives, the sequential decision-making formulation of the autocomplete problem provides a better suggestion policy than myopic single-step reasoning. However, aligning these objectives with real users requires further exploration. In particular, we hypothesize that the objectives under which sequential decision-making can improve autocomplete systems are not tailored solely to text entry speed, but more broadly to metrics such as user satisfaction and convenience.

Sequential Decision-Making for Inline Text Autocomplete

TL;DR

This work reframes inline text autocomplete as a sequential decision-making problem using reinforcement learning to explicitly account for cognitive load via a reward tied to text-entry speed. While theoretical analysis identifies conditions where farsighted policies can surpass myopic ones, empirical results with idealized users show limited speed gains over a fixed-threshold baseline, and a real-user study emphasizes correctness-driven cognitive load over suggestion length. The findings suggest that RL-based improvements may require more realistic user behavior (stochastic acceptance, typos, semantic matching) and a broader objective beyond raw speed, such as user satisfaction and convenience. The work lays a framework for future exploration of RL in inline autocompletion with realistic user modeling and richer reward signals that reflect actual user experience.

Abstract

Autocomplete suggestions are fundamental to modern text entry systems, with applications in domains such as messaging and email composition. Typically, autocomplete suggestions are generated from a language model with a confidence threshold. However, this threshold does not directly take into account the cognitive load imposed on the user by surfacing suggestions, such as the effort to switch contexts from typing to reading the suggestion, and the time to decide whether to accept the suggestion. In this paper, we study the problem of improving inline autocomplete suggestions in text entry systems via a sequential decision-making formulation, and use reinforcement learning to learn suggestion policies through repeated interactions with a target user over time. This formulation allows us to factor cognitive load into the objective of training an autocomplete model, through a reward function based on text entry speed. We acquired theoretical and experimental evidence that, under certain objectives, the sequential decision-making formulation of the autocomplete problem provides a better suggestion policy than myopic single-step reasoning. However, aligning these objectives with real users requires further exploration. In particular, we hypothesize that the objectives under which sequential decision-making can improve autocomplete systems are not tailored solely to text entry speed, but more broadly to metrics such as user satisfaction and convenience.
Paper Structure (20 sections, 7 equations, 7 figures)

This paper contains 20 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: Left: An overview of the workflow of our RL agent for inline text autocomplete. A language model (LM) generates $k$ candidate completions of the current text. The RL agent decides which, if any, to give the user as an inline suggestion. The agent is rewarded based on the user's text entry speed, which takes into account the cognitive load of showing suggestions. Right: The interface for our user study (Section \ref{['subsec:user_study']}). In this example, the user was asked to type the sentence "sorry, i'll call later" on a keyboard. Currently, they have typed "sorry, i'll ca", and a suggestion was made that completes the last word as "call", which the user can accept by pressing the tab key.
  • Figure 2: Number of states where the optimal farsighted policy ($\gamma=1$) and the optimal myopic policy ($\gamma=0$) disagree, for various values of $\alpha$.
  • Figure 3: Average return and number of characters saved, over 5 independent runs (including training, for RL agents). Error bars depict 95% confidence interval. Left two: SMS dataset, 50 sentences in evaluation set. Right two: Reddit Webis-TLDR-17 dataset, 400 sentences in evaluation set.
  • Figure 4: Left two plots: Analysis of how varying $\gamma$ affects RL agents. Right two plots: Results of rerunning experiments with $\alpha=0.4$, as suggested by our theory (Section \ref{['sec:theory']}). All plots use the Reddit Webis-TLDR-17 dataset and show the same metrics (y-axis) as in Figure \ref{['fig:comp_baseline']}. In the right two plots, DQN results are omitted because they were significantly worse than PPO and IQL, and we changed the threshold-based agent to use threshold 0.7 based on re-tuning with the updated $\alpha$.
  • Figure 5: User study results (Section \ref{['subsec:user_study']}). Error bars depict $95\%$ confidence interval. Left: Average cognitive load versus suggestion length. The cognitive load did not grow significantly with suggestion length. Right: Average cognitive load versus suggestion correctness. There is a significant difference between the user's cognitive load when considering correct versus incorrect suggestions.
  • ...and 2 more figures