Table of Contents
Fetching ...

Human-centered In-building Embodied Delivery Benchmark

Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong

TL;DR

This work proposes a specific commercial scenario simulation, human-centered in-building embodied delivery, and develops a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station.

Abstract

Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station. This environment also includes autonomous human characters and robots with grasping and mobility capabilities, as well as a large number of interactive items. Based on this environment, we have built a delivery dataset containing 13k language instructions to guide robots in providing services. We simulate human behavior through human characters and sample their various needs in daily life. Finally, we proposed a method centered around a large multimodal model to serve as the baseline system for this dataset. Compared to past embodied data work, our work focuses on a virtual environment centered around human-robot interaction for commercial scenarios. We believe this will bring new perspectives and exploration angles to the embodied community.

Human-centered In-building Embodied Delivery Benchmark

TL;DR

This work proposes a specific commercial scenario simulation, human-centered in-building embodied delivery, and develops a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station.

Abstract

Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station. This environment also includes autonomous human characters and robots with grasping and mobility capabilities, as well as a large number of interactive items. Based on this environment, we have built a delivery dataset containing 13k language instructions to guide robots in providing services. We simulate human behavior through human characters and sample their various needs in daily life. Finally, we proposed a method centered around a large multimodal model to serve as the baseline system for this dataset. Compared to past embodied data work, our work focuses on a virtual environment centered around human-robot interaction for commercial scenarios. We believe this will bring new perspectives and exploration angles to the embodied community.

Paper Structure

This paper contains 26 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Human-centered in-building embodied delivery describes a task that originates from a real commercial delivery scenario. It mainly refers to the precise delivery service for users in private spaces where external delivery services cannot be used, achieved through embodied robots. This task typically requires the robot to locate the target item based on the user's needs (e.g. grasp a water bottle from the kitchen and bring it to me.) across multiple rooms within the three-story building (a polar research station building, See the thumbnail in the top right corner) and ultimately deliver it to the designated location/person. The robot needs to consider the user's context (behavior or schedule), as the user will be moving around the building according to their own goals during the delivery.
  • Figure 2: The available information in task.
  • Figure 3: Environment includes three-story buildings, items, human characters, and robot.
  • Figure 4: A data generation instance. We generate human activities, target objects, robot positions, task instructions, and a complete process of robot execution based on the settings combined with large models.
  • Figure 5: Modular method for the robot delivery task with LLM and LMM.
  • ...and 7 more figures