Table of Contents
Fetching ...

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

James F. Mullen, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadan

TL;DR

This work moves toward more helpful home robots by enabling them to inform their users of dangerous or unsanitary anomalies in the home, and pursues a classification technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children.

Abstract

Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

TL;DR

This work moves toward more helpful home robots by enabling them to inform their users of dangerous or unsanitary anomalies in the home, and pursues a classification technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children.

Abstract

Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.
Paper Structure (13 sections, 4 figures, 4 tables)

This paper contains 13 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: In this work we aim to enable embodied agent to detect unsafe and unsanitary conditions in the home. For this, we first create a new, unique dataset with unsafe and unsanitary conditions to detect. We then hypothesize that Large Language Models (LLMs) contain the knowledge needed to logically operate on these conditions. Our approach creates LLM prompts that leverage object relationships in the scene from a scene graph (like those in the right image with objects being nodes and relationships being edges), and classifies them. In addition to testing on our dataset, we tested in the real world using the ClearPath TurtleBot in scenarios like that shown here.
  • Figure 2: A sampling of images from the SafetyDetect dataset showing unsafe conditions. In one of the images, medication and alcohol are on the floor and dangerous for children. In another, a pile of clothes are in the doorway - a tripping hazard for users.
  • Figure 3: The flow of our method is depicted here. We first get the scene graph before using it to formulate a prompt with asks the LLM to categorize the object relations. The model must then use commonsense reasoning to categorize these object relations effectively.
  • Figure 4: We deploy a ClearPath TurtleBot in the real world. We use existing methods to generate an effective scene graph before utilizing our method to detect anomalies in the scene. In this example, medication and cleaning products are on the table. This is captured in the scene graph and detected by the model as unsafe for children.