Table of Contents
Fetching ...

Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?

Mateusz Dubiel, Anastasia Sergeeva, Luis A. Leiva

TL;DR

The paper investigates whether voice fidelity and prosody in synthetic speech can subtly bias decision making, framing this as a potential dark pattern in voice interfaces. It employs a two-stage design with $N=50$ for a voice-perception study and $N=101$ for a decision-making task, comparing Standard TTS to neural voices. Findings show neural, high-fidelity voices are rated more favorably and bias choices toward their presented options, while participants underestimate the influence of voice on their decisions. The authors discuss ethical guidelines, user customization, and domain-tailored interaction to mitigate manipulation, highlighting regulatory and practical implications for the design of voice-based agents.

Abstract

Manipulative design in user interfaces (conceptualized as dark patterns) has emerged as a significant impediment to the ethical design of technology and a threat to user agency and freedom of choice. While previous research focused on exploring these patterns in the context of graphical user interfaces, the impact of speech has largely been overlooked. We conducted a listening test (N = 50) to elicit participants' preferences regarding different synthetic voices that varied in terms of synthesis method (concatenative vs. neural) and prosodic qualities (speech pace and pitch variance), and then evaluated their impact in an online decision-making study (N = 101). Our results indicate a significant effect of voice qualities on the participant's choices, independently from the content of the available options. Our results also indicate that the voice's perceived engagement, ease of understanding, and domain fit directly translate to its impact on participants' behaviour in decision-making tasks.

Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?

TL;DR

The paper investigates whether voice fidelity and prosody in synthetic speech can subtly bias decision making, framing this as a potential dark pattern in voice interfaces. It employs a two-stage design with for a voice-perception study and for a decision-making task, comparing Standard TTS to neural voices. Findings show neural, high-fidelity voices are rated more favorably and bias choices toward their presented options, while participants underestimate the influence of voice on their decisions. The authors discuss ethical guidelines, user customization, and domain-tailored interaction to mitigate manipulation, highlighting regulatory and practical implications for the design of voice-based agents.

Abstract

Manipulative design in user interfaces (conceptualized as dark patterns) has emerged as a significant impediment to the ethical design of technology and a threat to user agency and freedom of choice. While previous research focused on exploring these patterns in the context of graphical user interfaces, the impact of speech has largely been overlooked. We conducted a listening test (N = 50) to elicit participants' preferences regarding different synthetic voices that varied in terms of synthesis method (concatenative vs. neural) and prosodic qualities (speech pace and pitch variance), and then evaluated their impact in an online decision-making study (N = 101). Our results indicate a significant effect of voice qualities on the participant's choices, independently from the content of the available options. Our results also indicate that the voice's perceived engagement, ease of understanding, and domain fit directly translate to its impact on participants' behaviour in decision-making tasks.
Paper Structure (34 sections, 5 figures, 6 tables)

This paper contains 34 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of Experimental Stages.
  • Figure 2: Prosodic qualities of selected voices: Standard TTS (baseline) and two Neural TTS voices. Error bars denote the standard deviation of pitch, based on acoustic analysis.
  • Figure 3: Independent-Samples Kruskal-Wallis ranks for Standard TTS, Neural TTS1, and Neural TTS2 in terms of: Ease of understanding, Listening enjoyment, and Domain suitability. Note: '***' indicates p < .001
  • Figure 4: Proportion of selected options when presented by Neural TTS and Standard TTS. For example, the first purple bar on the left means that there were 2 participants who selected options presented by Neural TTS 0 out of 4 times.
  • Figure 5: Related-Samples Friedman's Two-Way Analysis of Variance by Ranks in experimental conditions.