Table of Contents
Fetching ...

Modulating Language Model Experiences through Frictions

Katherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang Bhatt

TL;DR

This work proposes selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse and motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.

Abstract

Language models are transforming the ways that their users engage with the world. Despite impressive capabilities, over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. How can we develop scaffolding around language models to curate more appropriate use? We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse. Frictions involve small modifications to a user's experience, e.g., the addition of a button impeding model access and reminding a user of their expertise relative to the model. Through a user study with real humans, we observe shifts in user behavior from the imposition of a friction over LLMs in the context of a multi-topic question-answering task as a representative task that people may use LLMs for, e.g., in education and information retrieval. We find that frictions modulate over-reliance by driving down users' click rates while minimally affecting accuracy for those topics. Yet, frictions may have unintended effects. We find marked differences in users' click behaviors even on topics where frictions were not provisioned. Our contributions motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.

Modulating Language Model Experiences through Frictions

TL;DR

This work proposes selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse and motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.

Abstract

Language models are transforming the ways that their users engage with the world. Despite impressive capabilities, over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. How can we develop scaffolding around language models to curate more appropriate use? We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse. Frictions involve small modifications to a user's experience, e.g., the addition of a button impeding model access and reminding a user of their expertise relative to the model. Through a user study with real humans, we observe shifts in user behavior from the imposition of a friction over LLMs in the context of a multi-topic question-answering task as a representative task that people may use LLMs for, e.g., in education and information retrieval. We find that frictions modulate over-reliance by driving down users' click rates while minimally affecting accuracy for those topics. Yet, frictions may have unintended effects. We find marked differences in users' click behaviors even on topics where frictions were not provisioned. Our contributions motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.
Paper Structure (18 sections, 1 equation, 9 figures, 2 tables)

This paper contains 18 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Frictions permit continued model access, but require more effort to procure access. Left: unrestricted access; middle: restricted access; right: frictioned access. We explore the use of selective frictions with respect to user expertise as a way to modulate the ease of model access across task instances.
  • Figure 2: Frictioning reduces clicks to see LLM predictions. We measure the click rate for each user across topics. We find that, for all topics, click rates are statistically significantly reduced ($p < 0.05$) in the selective friction condition. Error bars indicate standard error over participants.
  • Figure 3: Interface for selective friction; here the user scored higher than the model in their pre-quiz on mathematics. If the user presses the "Show AI Prediction" button (first button) they are then presented with the red block (second button) which they are forced to click if they still want to see the prediction.
  • Figure 4: Comparing per-topic click rates people within the friction condition ($N=53$) who did or did not see a friction. No one was frictioned for biology.
  • Figure 5: Example interface that the user is presented with for each MMLU question, where they have the option to click a button to query the AI.
  • ...and 4 more figures