Table of Contents
Fetching ...

ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation

Vidhi Jain, Rishi Veerapaneni, Yonatan Bisk

TL;DR

This work generates data on how loud an 'impulse' sounds at different listener locations in simulated homes, and trains the Acoustic Noise Predictor (ANP), which combines action acoustics with visual observations of indoor environments to train the robot to passively perceive loudness.

Abstract

We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the geometry and material composition of rooms, we train the robot to passively perceive loudness using visual observations of indoor environments. To this end, we generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP). Next, we collect acoustic profiles corresponding to different actions for navigation. Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment. See code and data at https://anavi-corl24.github.io/

ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation

TL;DR

This work generates data on how loud an 'impulse' sounds at different listener locations in simulated homes, and trains the Acoustic Noise Predictor (ANP), which combines action acoustics with visual observations of indoor environments to train the robot to passively perceive loudness.

Abstract

We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the geometry and material composition of rooms, we train the robot to passively perceive loudness using visual observations of indoor environments. To this end, we generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP). Next, we collect acoustic profiles corresponding to different actions for navigation. Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment. See code and data at https://anavi-corl24.github.io/

Paper Structure

This paper contains 19 sections, 1 equation, 15 figures, 2 tables.

Figures (15)

  • Figure 1: We generate data for acoustic impulse response in simulated 3D scenes from real-world environment scans. x-axis shows the relative distance of the listener from the source agent, y-axis shows the max decibels of a simulated Impulse Response (IR) at listener. The box area on the panaroma shows the listener's direction relative to the agent. Note the complex response pattern as materials, objects, and geometry have non-linear interactions with the sound.
  • Figure 2: Histogram of max decibel values from simulated data
  • Figure 3: Architecture. Our Acoustic Noise Prediction (ANP) model consists of Image encoder, Direction-Distance encoder and Predictor modules. The inputs are the 360° RGB panorama view at the robot's location, and the relative polar coordinates of the listener. The square box on the panorama highlights robot's current facing direction, and is drawn for illustration purposes only. The output is the max decibel (dB) of the Room Impulse Response (RIR) at the listener's location.
  • Figure 4: Main Results. Predicted Data Distribution plots for (left to right) Heuristic, DisLinReg, DirDisMLP, Ours(Pano)-VisDirDis.
  • Figure 5: $\epsilon-$threshold accuracy for Heuristic, DisLinReg, DirDisMLP, EgoVisDirDis, and PanoVisDirDis
  • ...and 10 more figures