Speech as Interactive Design Material (SIDM): How to design and evaluate task-tailored synthetic voices?
Mateusz Dubiel, Matthew Aylett, Anuschka Schmitt, Zilin Ma, Gary Hsieh, Thiemo Wambsganss
TL;DR
This paper outlines a workshop framework, Speech as Interactive Design Material (SIDM), aimed at improving the design and evaluation of task-tailored synthetic voices. It addresses the gap in ecological validity by uniting experts across audio engineering, speech perception, and UX design to develop standardized design practices and evaluation criteria for interactive TTS interfaces. Through demonstrations, prototyping sessions, and cross-disciplinary discussions, the workshop seeks to build a community that advances expressive, domain-specific TTS and yields more reliable user-centered outcomes. The practical impact is a more robust, transferable methodology for evaluating voice interfaces in real-world contexts, with clearer guidelines for tailoring voice prosody and interaction patterns to tasks.
Abstract
The aim of this workshop is two-fold. First, it aims to establish a research community focused on design and evaluation of synthetic speech (TTS) interfaces that are tailored not only to goal oriented tasks (e.g., food ordering, online shopping) but also personal growth and resilience promoting applications (e.g., coaching, mindful reflection, and tutoring). Second, through discussion and collaborative efforts, to establish a set of practices and standards that will help to improve ecological validity of TTS evaluation. In particular, the workshop will explore the topics such as: interaction design of voice-based conversational interfaces; the interplay between prosodic aspects (e.g., pitch variance, loudness, jitter) of TTS and its impact on voice perception. This workshop will serve as a platform on which to build a community that is better equipped to tackle the dynamic field of interactive TTS interfaces, which remains understudied, yet increasingly pertinent to everyday lives of users.
