The Un-Kidnappable Robot: Acoustic Localization of Sneaking People
Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays
TL;DR
The study tackles the safety-critical problem of detecting and localizing people around robots using only incidental, passive sounds produced by moving individuals. It introduces the Robot Kidnapper dataset, a synchronized collection of 4-channel audio and 360° RGB video, and trains a multi-task model to simultaneously estimate azimuth and radial distance while detecting moving presence, all from audio alone. Key contributions include a public, diverse dataset, a robust audio-only localization model outperforming acoustic baselines, and a real-robot demonstration on a Stretch RE-1 showing real-time robotic awareness without active sensing. The work demonstrates the viability of passive audio sensing for robust human awareness in robotics, offering a fallback mechanism when visual or other sensors fail and enabling safer human-robot interaction in everyday environments.
Abstract
How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using only audio. We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot
