Table of Contents
Fetching ...

Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles

Andrew Anderson, Jimena Noa Guevara, Fatima Moussaoui, Tianyi Li, Mihaela Vorvoreanu, Margaret Burnett

TL;DR

This study addresses AI UX inclusivity by examining who benefits from guideline-based improvements in AI products, using GenderMag's five problem-solving styles to analyze 1,016 participants across 16 vignette-based experiments. It compares guideline-violating versus guideline-applying AI features for productivity tools, measuring inclusivity across risk attitudes and four additional problem-solving styles, as well as gender and age demographics. The findings show that applying HAI guidelines generally yields inclusivity gains for diverse problem-solvers, with many patterns being actionable for practitioners, including increased user control and transparency; some cases reveal risks where gains are limited for risk-averse users. The work advances practical methods to assess equity alongside inclusivity in HAI-UX and demonstrates systematic links between problem-solving diversity and demographic diversity, offering design guidance to create more inclusive AI experiences across user groups.

Abstract

Motivations: Recent research has emerged on generally how to improve AI product user experiences, but relatively little is known about an AI product's inclusivity. For example, what kinds of users does it support well, and who does it leave out? And what changes in the product would make it more inclusive? Objectives: Our overall objective is to help fill this gap, investigating what kinds of diverse users an AI product leaves out, and how to act upon that knowledge. To bring actionability to our findings, we focus on users' diversity of problem-solving attributes. Thus, our specific objectives were: (1) to reveal whether participants with diverse problem-solving styles were left behind in a set of AI products; and (2) to relate participants' problem-solving diversity to their demographic diversity, specifically, gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants. Each experiment compared two AI products: one deliberately violating an HAI guideline and the other applying the guideline. For our first objective, we analyzed how much each AI product gained/lost inclusivity compared to its counterpart, where inclusivity was supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants' problem-solving styles aligned with their demographics, namely their genders and ages. Results & Implications: Participants' diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline-but the "who" that got most of the inclusivity varied widely by guideline and by problem-solving style...

Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving Styles

TL;DR

This study addresses AI UX inclusivity by examining who benefits from guideline-based improvements in AI products, using GenderMag's five problem-solving styles to analyze 1,016 participants across 16 vignette-based experiments. It compares guideline-violating versus guideline-applying AI features for productivity tools, measuring inclusivity across risk attitudes and four additional problem-solving styles, as well as gender and age demographics. The findings show that applying HAI guidelines generally yields inclusivity gains for diverse problem-solvers, with many patterns being actionable for practitioners, including increased user control and transparency; some cases reveal risks where gains are limited for risk-averse users. The work advances practical methods to assess equity alongside inclusivity in HAI-UX and demonstrates systematic links between problem-solving diversity and demographic diversity, offering design guidance to create more inclusive AI experiences across user groups.

Abstract

Motivations: Recent research has emerged on generally how to improve AI product user experiences, but relatively little is known about an AI product's inclusivity. For example, what kinds of users does it support well, and who does it leave out? And what changes in the product would make it more inclusive? Objectives: Our overall objective is to help fill this gap, investigating what kinds of diverse users an AI product leaves out, and how to act upon that knowledge. To bring actionability to our findings, we focus on users' diversity of problem-solving attributes. Thus, our specific objectives were: (1) to reveal whether participants with diverse problem-solving styles were left behind in a set of AI products; and (2) to relate participants' problem-solving diversity to their demographic diversity, specifically, gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants. Each experiment compared two AI products: one deliberately violating an HAI guideline and the other applying the guideline. For our first objective, we analyzed how much each AI product gained/lost inclusivity compared to its counterpart, where inclusivity was supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants' problem-solving styles aligned with their demographics, namely their genders and ages. Results & Implications: Participants' diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline-but the "who" that got most of the inclusivity varied widely by guideline and by problem-solving style...

Paper Structure

This paper contains 29 sections, 10 figures, 14 tables.

Figures (10)

  • Figure 1: An outcome of an experiment comparing two versions of "G3" AI products li-MSR-work. X-axis shows the amount of improvement resulting, for the 13 variables (not labeled here) on the y-axis. *: difference was statistically significant.
  • Figure 2: Amershi et al.'s 18 guidelines for human-AI interaction amershi2019guidelines. For the 4 phases (left column), each guideline has a number, title, and brief description. Our analyses exclude the two guidelines' experiments (Guidelines 2 & 16, greyed out) which did not pass the manipulation check, as Li et al. did li-MSR-work.
  • Figure 3: Guideline 1's ("make clear what the system can do") two vignettes. Each vignette had three components: (1) A product and feature introduction, describing what the product was and what it did, (2) the behavior description of the manipulated AI feature that differentiated the guideline's violation from its application, and (3) the AI performance description. Note that participants were never exposed to the concept of guideline violations or applications; instead, they saw only generic names (Ione & Kelso).
  • Figure 4: Thumbnails of Investigation One's results for each guideline's experiment. More color indicates larger effect sizes. *: difference was statistically significant. See Li et al. li-MSR-work for full details.
  • Figure 5: Participants' risk scores (y-axis) for each experiment (x-axis). "x"s mark the means, horizontal lines mark the medians. Participants above the median are more risk-averse than their peers below the median.
  • ...and 5 more figures