LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Haonan Chang; Kai Gao; Kowndinya Boyalakuntla; Alex Lee; Baichuan Huang; Harish Udhaya Kumar; Jinjin Yu; Abdeslam Boularias

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Haonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias

TL;DR

LGMCTS is presented, a framework that uniquely combines language guidance with geometrically informed sampling distributions to effectively rearrange objects according to geometric patterns dictated by natural language descriptions.

Abstract

We introduce a novel approach to the executable semantic object rearrangement problem. In this challenge, a robot seeks to create an actionable plan that rearranges objects within a scene according to a pattern dictated by a natural language description. Unlike existing methods such as StructFormer and StructDiffusion, which tackle the issue in two steps by first generating poses and then leveraging a task planner for action plan formulation, our method concurrently addresses pose generation and action planning. We achieve this integration using a Language-Guided Monte-Carlo Tree Search (LGMCTS). Quantitative evaluations are provided on two simulation datasets, and complemented by qualitative tests with a real robot.

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

TL;DR

Abstract

Paper Structure (17 sections, 1 theorem, 3 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 1 theorem, 3 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
RELATED WORKS
Learning-based Semantic Rearrangement
LLM-driven Task And Motion Planning
Preliminaries
Problem Formulation
Monte Carlo Tree Search (MCTS)
Method
Language Parsing & Object Selection
Parametric Geometric Prior
Monte-Carlo Tree Search (MCTS) for TAMP
EXPERIMENTS
Baselines
Structformer Dataset
ELGR-Benchmark
...and 2 more sections

Key Result

Proposition IV.1

MCTS-Planner is probabilistic complete.

Figures (6)

Figure 1: Robotic Setup: a UR5e robot equipped with a RealSense D455 camera. The task is to re-arrange the objects, which are unknown to the robot, according to a natural language instruction.
Figure 2: An example of language parsing. We are using GPT-4 brown2020language in this work.
Figure 3: Visualization of $(x,y)$ prior for 'line' pattern. From left to right: $K=0$, $K=1$, $K=2$, $K=3$, where $K=|O_{R}^{sampled}|$, the number of sampled object poses. White star marks are sampled poses. When $K=0$, the pose can be sampled anywhere. When $K=1$, it needed to sampled outside a circle region. After that, all poses will be sampled along the line defined by the first two poses.
Figure 4: A minimal example illustrates our MCTS-Planner's aim to arrange a table. The language description provided is: "Can you please put the apple behind the spoon? And I also want the cup at the right of the apple." The top row displays the current scene arrangement, while the bottom row shows the $f_{prior}$ and $f_{free}$ for the object being manipulated. $f=f_{prior} \times f_{free}$. In spatial distribution figures, black represents probability 0, and white probability 1.
Figure 5: Real world demonstration with a UR5e robot. The language instructions for the five scenes are: (a) "Move all blocks into a circle; while put the white bottle behind one block;" (b) "Put all boxes into a rectangle; and move the white bottle to the right of one box;" (c) "Move bottles into a line; and formulate all phones into another line;" (d) "Formulate all yellow objects into a line;" (e) "Set all phones into a line;". The top row images show the initial scenes and the bottom ones show the results of using LGMCTS on the UR5e. Dotted lines imply a shape pattern and red arrows indicate a spatial pattern (left, right, front, back). These real robot experiments show that LGMCTS can parse complex language instructions and also deal with infeasible start configurations as well as pattern composition.
...and 1 more figures

Theorems & Definitions (2)

Proposition IV.1
proof

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

TL;DR

Abstract

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)