The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

Mengyu Yang; Patrick Grady; Samarth Brahmbhatt; Arun Balajee Vasudevan; Charles C. Kemp; James Hays

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays

TL;DR

The study tackles the safety-critical problem of detecting and localizing people around robots using only incidental, passive sounds produced by moving individuals. It introduces the Robot Kidnapper dataset, a synchronized collection of 4-channel audio and 360° RGB video, and trains a multi-task model to simultaneously estimate azimuth and radial distance while detecting moving presence, all from audio alone. Key contributions include a public, diverse dataset, a robust audio-only localization model outperforming acoustic baselines, and a real-robot demonstration on a Stretch RE-1 showing real-time robotic awareness without active sensing. The work demonstrates the viability of passive audio sensing for robust human awareness in robotics, offering a fallback mechanism when visual or other sensors fail and enabling safer human-robot interaction in everyday environments.

Abstract

How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using only audio. We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

TL;DR

Abstract

Paper Structure (21 sections, 1 equation, 6 figures, 2 tables)

This paper contains 21 sections, 1 equation, 6 figures, 2 tables.

INTRODUCTION
RELATED WORK
Human Detection with Visual Perception
Audio-Based Perception for Robots
Dataset
Human Presence Recordings
Empty Room Recordings
Person Location Labels
Hardware
Methodology
Background Subtraction
Empty Room Augmentation
Models
Experiments
Model Comparisons
...and 6 more sections

Figures (6)

Figure 1: Can we detect where people are based only on the subtle sounds they incidentally produce when they move, even when they try to be quiet? We collect a dataset of high-quality audio paired with 360° RGB data with different participants in multiple indoor scenes. We train models to localize a moving person based on audio only and implement it on a robot.
Figure 2: Frames from the Robot Kidnapper dataset (static robot). The participant wears a hat with ArUco markers garrido2014automatic used to calculate ground truth radial distance. The RGB frames are used to calculate the ground truth centroid of the person using DeepLabv3+ chen2018encoder. Only the audio is used during training. The vertical red lines are the angles predicted by our model in an unseen room. The participant is walking normally in these frames.
Figure 3: (a) Dataset capture setup. (b) Distribution of radial distances between the robot and person in the dataset.
Figure 4: Diagram of our model architecture. We perform background subtraction (Sec. \ref{['sec:back sub']}) on input spectrograms before passing them through a spectrogram encoder with shared weights. The resulting features are concatenated and passed through the feature encoder based on the ASPP module chen2018encoder. The output is fed to 4 linear layer heads for the prediction tasks.
Figure 5: Log spectrograms for all categories along with regular talking. No talking is used in our work, but we show the spectrogram as reference for a common sound source used in localization. All recordings were taken in the same room during the same recording session.
...and 1 more figures

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

TL;DR

Abstract

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

Authors

TL;DR

Abstract

Table of Contents

Figures (6)