SonicBoom: Contact Localization Using Array of Microphones
Moonyoung Lee, Uksang Yoo, Jean Oh, Jeffrey Ichnowski, George Kantor, Oliver Kroemer
TL;DR
SonicBoom addresses contact localization during collisions in occluded, cluttered environments by embedding six contact microphones along a robot end-effector and learning a mapping from vibrotactile signals and proprioception to the contact location on a cylindrical surface. The approach fuses mel spectrograms, GCC-PHAT features, and trajectory data through a multi-modal transformer, trained on a large dataset of $108{,}000$ audio files from $18{,}000$ interactions. It achieves high-precision localization with MED from $0.42\mathrm{cm}$ in-distribution to $2.22\mathrm{cm}$ under out-of-distribution conditions and demonstrates practical haptic mapping in mock canopies, including zero-shot transfer with audio-only inputs. The work highlights the viability and generalization of acoustic sensing for tactile localization in visually challenging outdoor robotics tasks, suggesting avenues for continuous tracking and multi-contact estimation in future work.
Abstract
In cluttered environments where visual sensors encounter heavy occlusion, such as in agricultural settings, tactile signals can provide crucial spatial information for the robot to locate rigid objects and maneuver around them. We introduce SonicBoom, a holistic hardware and learning pipeline that enables contact localization through an array of contact microphones. While conventional sound source localization methods effectively triangulate sources in air, localization through solid media with irregular geometry and structure presents challenges that are difficult to model analytically. We address this challenge through a feature engineering and learning based approach, autonomously collecting 18,000 robot interaction sound pairs to learn a mapping between acoustic signals and collision locations on the robot end effector link. By leveraging relative features between microphones, SonicBoom achieves localization errors of 0.42cm for in distribution interactions and maintains robust performance of 2.22cm error even with novel objects and contact conditions. We demonstrate the system's practical utility through haptic mapping of occluded branches in mock canopy settings, showing that acoustic based sensing can enable reliable robot navigation in visually challenging environments.
