Table of Contents
Fetching ...

Combining Planning and Diffusion for Mobility with Unknown Dynamics

Yajvan Ravan, Zhutian Yang, Tao Chen, Tomás Lozano-Pérez, Leslie Pack Kaelbling

TL;DR

PoPi addresses long-horizon mobile manipulation with unknown dynamics by marrying a high-level A* roadmap planner that outputs intermediate waypoints with a low-level short-horizon diffusion policy that executes relative motions toward each waypoint. The approach leverages data-efficient imitation learning for local control while relying on planning to handle obstacle-rich, long-horizon navigation, enabling zero-shot generalization to new chairs, grasps, and flooring. Empirical results on a Spot robot pushing a five-wheeled chair show PoPi outperforms pure diffusion and pure planning baselines, with long-horizon success up to about $80\%$ in training and around $70\%$ in unseen layouts, indicating strong practical impact for deployable mobile manipulation under unknown dynamics.

Abstract

Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long-horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical manipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems. Project Page: https://yravan.github.io/plannerorderedpolicy/

Combining Planning and Diffusion for Mobility with Unknown Dynamics

TL;DR

PoPi addresses long-horizon mobile manipulation with unknown dynamics by marrying a high-level A* roadmap planner that outputs intermediate waypoints with a low-level short-horizon diffusion policy that executes relative motions toward each waypoint. The approach leverages data-efficient imitation learning for local control while relying on planning to handle obstacle-rich, long-horizon navigation, enabling zero-shot generalization to new chairs, grasps, and flooring. Empirical results on a Spot robot pushing a five-wheeled chair show PoPi outperforms pure diffusion and pure planning baselines, with long-horizon success up to about in training and around in unseen layouts, indicating strong practical impact for deployable mobile manipulation under unknown dynamics.

Abstract

Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long-horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical manipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems. Project Page: https://yravan.github.io/plannerorderedpolicy/

Paper Structure

This paper contains 16 sections, 1 equation, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: The Spot robot moving a chair to target location while, navigating among obstacles. The top environment is where training demonstrations were collected. Our hierarchical policy PoPi achieves 80-100% success rate on tests in this environment. The bottom is an unseen testing environment with higher-friction carpet & narrower pathways. PoPi generalizes zero-shot with 70% success.
  • Figure 2: Planner-Ordered Policy
  • Figure 3: Experimental Setup. Left is the robot and chair. An AprilTag for localization is also shown in the background. The right shows the AprilTag setup affixed to the chair.
  • Figure 4: Map of the training environment (in grayscale) with demonstration trajectories overlaid. Starting points are shown as red circles, roughly drawn from the respective dashed regions. Endpoints are shown as green squares, roughly drawn from the respective dashed regions.
  • Figure 5: Trajectory to test long-horizon success. Three long-horizon goals are given at 2 m, 6 m, and 10 m. An example execution of PoPi is shown here in blue. The light blue shape corresponds to the robot/chair system as described in \ref{['experiments:motion-planning']}
  • ...and 1 more figures