Table of Contents
Fetching ...

HomeEmergency -- Using Audio to Find and Respond to Emergencies in the Home

James F. Mullen, Dhruva Kumar, Xuewei Qi, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha, Richard Kim

TL;DR

HomeEmergency tackles the challenge of enabling home robots to detect and respond to emergencies using audible cues in multi-room environments. It proposes a modular pipeline built around a probabilistic dynamic scene graph (P-DSG), audio perception, Bayesian inference, and LLM-guided emergency identification, validated on a new ThreeDWorld-based dataset and demonstrated in real-world robot experiments. Key contributions include the P-DSG representation with probabilistic edges, the Bayesian inference framework that fuses audio signals with scene priors, and a sim-to-real demonstration showing practical transfer. The work yields substantial improvements over strong baselines in emergency localization and detection, with clear implications for rapid, in-home emergency responses.

Abstract

In the United States alone accidental home deaths exceed 128,000 per year. Our work aims to enable home robots who respond to emergency scenarios in the home, preventing injuries and deaths. We introduce a new dataset of household emergencies based in the ThreeDWorld simulator. Each scenario in our dataset begins with an instantaneous or periodic sound which may or may not be an emergency. The agent must navigate the multi-room home scene using prior observations, alongside audio signals and images from the simulator, to determine if there is an emergency or not. In addition to our new dataset, we present a modular approach for localizing and identifying potential home emergencies. Underpinning our approach is a novel probabilistic dynamic scene graph (P-DSG), where our key insight is that graph nodes corresponding to agents can be represented with a probabilistic edge. This edge, when refined using Bayesian inference, enables efficient and effective localization of agents in the scene. We also utilize multi-modal vision-language models (VLMs) as a component in our approach, determining object traits (e.g. flammability) and identifying emergencies. We present a demonstration of our method completing a real-world version of our task on a consumer robot, showing the transferability of both our task and our method. Our dataset will be released to the public upon this papers publication.

HomeEmergency -- Using Audio to Find and Respond to Emergencies in the Home

TL;DR

HomeEmergency tackles the challenge of enabling home robots to detect and respond to emergencies using audible cues in multi-room environments. It proposes a modular pipeline built around a probabilistic dynamic scene graph (P-DSG), audio perception, Bayesian inference, and LLM-guided emergency identification, validated on a new ThreeDWorld-based dataset and demonstrated in real-world robot experiments. Key contributions include the P-DSG representation with probabilistic edges, the Bayesian inference framework that fuses audio signals with scene priors, and a sim-to-real demonstration showing practical transfer. The work yields substantial improvements over strong baselines in emergency localization and detection, with clear implications for rapid, in-home emergency responses.

Abstract

In the United States alone accidental home deaths exceed 128,000 per year. Our work aims to enable home robots who respond to emergency scenarios in the home, preventing injuries and deaths. We introduce a new dataset of household emergencies based in the ThreeDWorld simulator. Each scenario in our dataset begins with an instantaneous or periodic sound which may or may not be an emergency. The agent must navigate the multi-room home scene using prior observations, alongside audio signals and images from the simulator, to determine if there is an emergency or not. In addition to our new dataset, we present a modular approach for localizing and identifying potential home emergencies. Underpinning our approach is a novel probabilistic dynamic scene graph (P-DSG), where our key insight is that graph nodes corresponding to agents can be represented with a probabilistic edge. This edge, when refined using Bayesian inference, enables efficient and effective localization of agents in the scene. We also utilize multi-modal vision-language models (VLMs) as a component in our approach, determining object traits (e.g. flammability) and identifying emergencies. We present a demonstration of our method completing a real-world version of our task on a consumer robot, showing the transferability of both our task and our method. Our dataset will be released to the public upon this papers publication.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: An overview of our method. Our agent hears a "thud" and determines that a fall may have occurred. It then leverages the probabilistic edges of our probabilistic dynamic scene graph (P-DSG), representing a heatmap of agent activity, and audio direction to update our P-DSG and produce a hypothesized source location. In this example, the P-DSG shows a high probability of the user, and thus any fall, being in the office (the red box). The agent checks the office and detects a fallen person. It calls emergency services.
  • Figure 2: A sample of images from the simulator showing simulated fall emergencies (left) and fire emergencies (middle). A 2D occupancy map showcasing the complexity of the overall environment is to the right. It also shows our method proceeding towards a fire, faded green dot, along a very efficient path, bright green.
  • Figure 3: An overview of our modular method and our mapping representation, the P-DSG. Our method takes inputs of audio and images. The method begins with the Mapping module which creates a probabilistic dynamic scene graph (P-DSG) (right). The audio is run through our Audio Perception module which outputs a label of the audio and an estimated direction it comes from. In our Inference module, we use this perception information and information from our P-DSG to determine the most likely room to be the source of the audio. We then go to this room, checking for emergencies, and updating our probabilities accordingly. The method continues until an emergency is found, the house is cleared, or in simulation, we run out of steps. To the right is our P-DSG where objects and agents are connected to their parent place and room with edges (places left out for simplicity). Note the probabilistic edges for the dynamic agents.
  • Figure 4: Results for the 'Falls' class of the HomeEmergency task.
  • Figure 5: Results for the 'Fires' class of the HomeEmergency task.
  • ...and 3 more figures