Visual Room Rearrangement

Luca Weihs; Matt Deitke; Aniruddha Kembhavi; Roozbeh Mottaghi

Visual Room Rearrangement

Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi

TL;DR

RoomR introduces Room Rearrangement, a two-stage interactive benchmark where an agent observes a goal room, then must restore the initial configuration after objects are moved or state-changed. The RoomR dataset (6,000 rearrangements across 120 AI2-THOR rooms and 72 object types) enables two task variants (1-Phase and 2-Phase) and uses a dual-policy model with non-parametric and semantic mapping components. Baselines using DD-PPO and imitation learning reveal significant challenges, with semantic mapping providing notable gains but performance far from perfect, underscoring the need for new architectures for comparative mapping and long-horizon visual reasoning. Overall, RoomR offers a rich testbed for navigation, manipulation, and planning in visually complex, partially observable environments, pushing beyond static-perception benchmarks.

Abstract

There has been a significant recent progress in the field of Embodied AI with researchers developing models and algorithms enabling embodied agents to navigate and interact within completely unseen environments. In this paper, we propose a new dataset and baseline models for the task of Rearrangement. We particularly focus on the task of Room Rearrangement: an agent begins by exploring a room and recording objects' initial configurations. We then remove the agent and change the poses and states (e.g., open/closed) of some objects in the room. The agent must restore the initial configurations of all objects in the room. Our dataset, named RoomR, includes 6,000 distinct rearrangement settings involving 72 different object types in 120 scenes. Our experiments show that solving this challenging interactive task that involves navigation and object interaction is beyond the capabilities of the current state-of-the-art techniques for embodied tasks and we are still very far from achieving perfect performance on these types of tasks. The code and the dataset are available at: https://ai2thor.allenai.org/rearrangement

Visual Room Rearrangement

TL;DR

Abstract

Visual Room Rearrangement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)