Building for Speech: Designing the Next Generation of Social Robots for Audio Interaction
Angus Addlesee, Ioannis Papaioannou, Oliver Lemon
TL;DR
The paper addresses the gap between advances in spoken dialogue systems and the absence of social robots in public spaces. It combines empirical experiences with the literature to identify recurring real-world challenges at the hardware-software interface, arguing for early, joint design between SDS researchers and robot engineers. Key recommendations include: multi-directional, high-volume speakers; expanded microphone arrays with beamforming and low latency; quieter actuators and fans to reduce ego-noise; and robust joint designs to prevent harm during use by older adults. These insights aim to shift development toward hardware-software co-design, enabling safe, accessible, and effective social robots in everyday public environments such as train stations, shopping malls, and hospitals.
Abstract
There have been incredible advancements in robotics and spoken dialogue systems (SDSs) over the past few years, yet we still don't find social robots in public spaces like train stations, shopping malls, or hospital waiting rooms. In this paper, we argue that early-stage collaboration between robot designers and SDS researchers is crucial to create social robots that can legitimately be used in real-world environments. We draw from our experiences running experiments with social robots, and the surrounding literature, to highlight recurring issues. Robots need more speakers, more microphones, quieter motors, and quieter fans to enable human-robot spoken interaction in the wild and improve accessibility. More robust robot joints are also needed to limit potential harm to older adults and other more vulnerable groups.
