Table of Contents
Fetching ...

ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing

Noah Steinkrüger, Nisarga Nilavadi, Wolfram Burgard, Tanja Katharina Kaiser

TL;DR

This paper tackles scalable cooperative object pushing by multiple nonholonomic robots with limited object knowledge. It introduces ConPoSe, an LLM-guided local search framework that selects contact points by prompting an LLM about the target pushing direction and refining the result through neighborhood search. The approach achieves strong time scalability and high success rates across various object shapes and robot counts in simulation, outperforming a purely LLM-based method and matching or exceeding analytical baselines in many settings. A key finding is that contact-point switching is the main bottleneck, guiding future work toward more robust switching strategies and real-world validation.

Abstract

Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for contact point selection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T-shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection.

ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing

TL;DR

This paper tackles scalable cooperative object pushing by multiple nonholonomic robots with limited object knowledge. It introduces ConPoSe, an LLM-guided local search framework that selects contact points by prompting an LLM about the target pushing direction and refining the result through neighborhood search. The approach achieves strong time scalability and high success rates across various object shapes and robot counts in simulation, outperforming a purely LLM-based method and matching or exceeding analytical baselines in many settings. A key finding is that contact-point switching is the main bottleneck, guiding future work toward more robust switching strategies and real-world validation.

Abstract

Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for contact point selection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T-shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection.

Paper Structure

This paper contains 24 sections, 9 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Visualization of the object pushing task: a cuboid is pushed from its initial position in the front-left to the goal in the back-right corner.
  • Figure 2: Overview of our approach. At the start of each experiment, we generate a global object path and key waypoints (Sec. \ref{['sec:obj_path_planning']}), along with a set of $M$ candidate contact points on the object contour (Sec. \ref{['sec:cp_generation']}). Object pushing then proceeds in closed loop. Contact points are selected using our proposed LLM-guided local search approach, ConPoSe, or using one of our two baselines: naive LLM-based selection or analytical computation of the contact points (Sec. \ref{['sec:push_config_selection']}), after which the robots switch to their assigned contact points (Sec. \ref{['sec:cp_switching']}). The object is then pushed (Sec. \ref{['sec:pushing']}) until either the goal is reached or re-selection becomes necessary (e.g., due to deviations from the planned path; Sec. \ref{['sec:online_adaptation']}).
  • Figure 3: Examples of four of our five scenes (the remaining scene is shown in Fig. \ref{['fig:Overview']}). For illustration purposes, each image is annotated with the planned object path $S_O$ (green dots), its simplified representation $S_O^\text{simplified}$ (green diamonds), the goal position, the object center (white plus sign), and the $M$ candidate contact points (numbered circles).
  • Figure 4: Scalability analysis: comparison of selection time $T_{sel}$, success rate (SR), and execution time ($T_{exe}$) for $N \in \{3, 5, 7, 10, 12, 15\}$. Data for the baseline is limited to $N < 12$ due to excessively long execution times.