Table of Contents
Fetching ...

"It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI Assistants

Emily Kuang, Ehsan Jahangirzadeh Soure, Luyao Shen, Nitesh Goyal, Mingming Fan, Kristen Shinohara

Abstract

AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.

"It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI Assistants

Abstract

AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.
Paper Structure (51 sections, 14 figures, 8 tables)

This paper contains 51 sections, 14 figures, 8 tables.

Figures (14)

  • Figure 1: User interface of the usability analysis tool for the conditions with CAs containing: A) video player, B) table of usability problem descriptions, causes, and redesign recommendations with the corresponding timestamp, and C) chat thread.
  • Figure 2: Bar charts showing participants' familiarity with usability analysis, frequency of AI tool usage, and familiarity with how AI works.
  • Figure 3: Flowchart of the longitudinal study containing five sessions in total. Each session included an introduction or recap, an analysis of three videos, and a debrief or semi-structured interview.
  • Figure 4: Timeline plots of analysis behaviors with session time on the x-axis and video time on the y-axis. The first column shows the three conditions (no CA, novice CA, and experienced CA) during session 1, while the second column presents these same conditions during session 5. These video timelines were chosen to illustrate the five main analysis strategies visually.
  • Figure 5: Stacked bar chart showing the distribution of five video analysis strategies by session and condition. Two-pass strategies were more frequent in the first two sessions, while the One-pass, No-Pause-Write strategy emerged in session 3 and was used by two-thirds of participants by session 5.
  • ...and 9 more figures