Table of Contents
Fetching ...

Task Planning for Object Rearrangement in Multi-room Environments

Karan Mirakhor, Sourav Ghosh, Dipanjan Das, Brojeshwar Bhowmick

TL;DR

This work tackles efficient task planning for object rearrangement in multi-room environments under partial observability by introducing a hierarchical planner that interleaves unseen object discovery and rearrangement. The framework comprises an Unseen Object Discovery Method (UODM) that leverages LLM-based commonsense with RoBERTa embeddings to predict probable room-receptacle placements, a Cross-Entropy Method (CEM) based collision-resolution module to allocate buffers for swap cases, a Directed State Graph (DSG) for scalable state representation, and a deep RL planner trained with Conservative Q-Learning to minimize agent traversal and steps. It additionally introduces MoPOR, a benchmark dataset designed to evaluate multi-room rearrangement with partial observability, blocked-goal, and swap scenarios across many objects and receptacles. Empirical results show that the proposed method outperforms baselines in key metrics such as Success Rate Normalized by Number of Steps (SRN), Unseen Object Discovery efficiency (EOD), and Total Traversal Length (TTL), demonstrating improved planning efficiency and robust handling of unseen objects and swaps. Overall, the approach advances practical robotic tidying in complex, partially observable environments by integrating language-based commonsense, geometry-aware collision handling, and data-driven planning in a unified pipeline.

Abstract

Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hierarchical task planner to efficiently plan a sequence of actions to discover unseen objects and rearrange misplaced objects within an untidy house to achieve a desired tidy state. The proposed method introduces several novel techniques, including (i) a method for discovering unseen objects using commonsense knowledge from large language models, (ii) a collision resolution and buffer prediction method based on Cross-Entropy Method to handle blocked goal and swap cases, (iii) a directed spatial graph-based state space for scalability, and (iv) deep reinforcement learning (RL) for producing an efficient planner. The planner interleaves the discovery of unseen objects and rearrangement to minimize the number of steps taken and overall traversal of the agent. The paper also presents new metrics and a benchmark dataset called MoPOR to evaluate the effectiveness of the rearrangement planning in a multi-room setting. The experimental results demonstrate that the proposed method effectively addresses the multi-room rearrangement problem.

Task Planning for Object Rearrangement in Multi-room Environments

TL;DR

This work tackles efficient task planning for object rearrangement in multi-room environments under partial observability by introducing a hierarchical planner that interleaves unseen object discovery and rearrangement. The framework comprises an Unseen Object Discovery Method (UODM) that leverages LLM-based commonsense with RoBERTa embeddings to predict probable room-receptacle placements, a Cross-Entropy Method (CEM) based collision-resolution module to allocate buffers for swap cases, a Directed State Graph (DSG) for scalable state representation, and a deep RL planner trained with Conservative Q-Learning to minimize agent traversal and steps. It additionally introduces MoPOR, a benchmark dataset designed to evaluate multi-room rearrangement with partial observability, blocked-goal, and swap scenarios across many objects and receptacles. Empirical results show that the proposed method outperforms baselines in key metrics such as Success Rate Normalized by Number of Steps (SRN), Unseen Object Discovery efficiency (EOD), and Total Traversal Length (TTL), demonstrating improved planning efficiency and robust handling of unseen objects and swaps. Overall, the approach advances practical robotic tidying in complex, partially observable environments by integrating language-based commonsense, geometry-aware collision handling, and data-driven planning in a unified pipeline.

Abstract

Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hierarchical task planner to efficiently plan a sequence of actions to discover unseen objects and rearrange misplaced objects within an untidy house to achieve a desired tidy state. The proposed method introduces several novel techniques, including (i) a method for discovering unseen objects using commonsense knowledge from large language models, (ii) a collision resolution and buffer prediction method based on Cross-Entropy Method to handle blocked goal and swap cases, (iii) a directed spatial graph-based state space for scalability, and (iv) deep reinforcement learning (RL) for producing an efficient planner. The planner interleaves the discovery of unseen objects and rearrangement to minimize the number of steps taken and overall traversal of the agent. The paper also presents new metrics and a benchmark dataset called MoPOR to evaluate the effectiveness of the rearrangement planning in a multi-room setting. The experimental results demonstrate that the proposed method effectively addresses the multi-room rearrangement problem.
Paper Structure (21 sections, 5 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 5 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: The graph shows the agent traversal for existing methods tideetrabuccoGhosh v/s Ours with increasing rearrangement area highlighting need for efficient planning. The error bars show the standard deviation in the average traversal.
  • Figure 2: (a) shows the top down view of our Rearrangement task and (b) is the agent's initial egocentric view in the untidy current state for the same setup. The solid 3D bounding boxes indicate the desired goal state for all objects, while the dashed ones show the initial positions of visible objects in the untidy current state. The dotted 3D bounding boxes represent initial positions of unseen objects in the untidy current state. The apple (yellow), an unseen object is inside the kitchen-fridge, while the vase (blue), pillow (pastel cyan) and sponge (magenta) is on the living-table, living-sofa and bedroom-chair respectively. There are two scenarios: a blocked goal case with the vase (blue) and pillow (pastel cyan) and a swap case between the bowl (green) and kettle (red).
  • Figure 3: Overall hierarchical pipeline of our proposed method.