Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

Wonjune Kang; Margaret A. Hughes; Deb Roy

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

Wonjune Kang, Margaret A. Hughes, Deb Roy

TL;DR

This paper tackles how to anonymize spoken voices in civic dialogue without eroding the expressive richness essential for empathetic engagement. It compares two anonymization strategies—voice conversion (VC), which preserves timbre and prosody, and text-to-speech (TTS), which alters linguistic realization—across two studies in a technology-enhanced civic network. Results show VC largely preserves listener empathy and trust, and enhances speakers’ sense of being heard, while TTS substantially reduces perceived emotion and authenticity unless listeners are aware of the modification. The findings support deploying VC in voice-based civic platforms to balance anonymity with meaningful, emotionally resonant storytelling, with implications for design, policy, and future research on anonymous spoken discourse in democratic processes.

Abstract

Anonymity is a powerful component of many participatory media platforms that can afford people greater freedom of expression and protection from external coercion and interference. However, it can be difficult to effectively implement on platforms that leverage spoken language due to distinct biomarkers present in the human voice. In this work, we explore the use of voice anonymization methods within the context of a technology-enhanced civic dialogue network based in the United States, whose purpose is to increase feelings of agency and being heard within civic processes. Specifically, we investigate the use of two different speech transformation and synthesis methods for anonymization: voice conversion (VC) and text-to-speech (TTS). Through a series of two studies, we examine the impact that each method has on 1) the empathy and trust that listeners feel towards a person sharing a personal story, and 2) a speaker's own perception of being heard, finding that voice conversion is an especially suitable method for our purposes. Our findings open up interesting potential research directions related to anonymous spoken discourse, as well as additional ways of engaging with voice-based civic technologies.

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

TL;DR

Abstract

Paper Structure (27 sections, 3 figures, 3 tables)

This paper contains 27 sections, 3 figures, 3 tables.

Introduction
Context and Related Work
The Role of Vocal Qualities in Spoken Language
Voice, Participation, and Anonymity in Democracy
Application Setting: Technology-Enhanced Civic Dialogue Network
Methods
Study 1: Perceptions of the Listener
Procedure
Survey design
Participants
Study 2: Perceptions of the Speaker
Participants and procedure
Survey design
Data Analysis Methods
Likert scale questions
...and 12 more sections

Figures (3)

Figure 1: Average scores for the 20 Likert scale questions in the survey for Study 1 when listeners were (a) unaware and (b) aware of anonymization. Error bars represent 95% confidence intervals. *, **, or *** on top of bars for VC and TTS denote statistically significant differences at $p < 0.05$, $p < 0.01$, or $p < 0.001$, respectively, for two-sided $t$-tests comparing against scores for Original.
Figure 2: Average scores for the 20 Likert scale questions in the survey for Study 1 comparing the unaware (blue) and aware (red) conditions for (a) Original, (b) VC, and (c) TTS voices. Error bars represent 95% confidence intervals. *, **, or *** on top of bars for Aware denote statistically significant differences at $p < 0.05$, $p < 0.01$, or $p < 0.001$, respectively, for two-sided $t$-tests comparing against scores for Unaware.
Figure 3: Average scores for the 5 Likert scale questions in the survey for Study 2 comparing the perceptions of speakers towards their own voices that were anonymized using VC (blue) and TTS (red). Error bars represent 95% confidence intervals. *, **, or *** on top of bars for TTS denote statistically significant differences at $p < 0.05$, $p < 0.01$, or $p < 0.001$, respectively, for two-sided $t$-tests comparing against scores for VC.

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

TL;DR

Abstract

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

Authors

TL;DR

Abstract

Table of Contents

Figures (3)