Combining Planning and Diffusion for Mobility with Unknown Dynamics
Yajvan Ravan, Zhutian Yang, Tao Chen, Tomás Lozano-Pérez, Leslie Pack Kaelbling
TL;DR
PoPi addresses long-horizon mobile manipulation with unknown dynamics by marrying a high-level A* roadmap planner that outputs intermediate waypoints with a low-level short-horizon diffusion policy that executes relative motions toward each waypoint. The approach leverages data-efficient imitation learning for local control while relying on planning to handle obstacle-rich, long-horizon navigation, enabling zero-shot generalization to new chairs, grasps, and flooring. Empirical results on a Spot robot pushing a five-wheeled chair show PoPi outperforms pure diffusion and pure planning baselines, with long-horizon success up to about $80\%$ in training and around $70\%$ in unseen layouts, indicating strong practical impact for deployable mobile manipulation under unknown dynamics.
Abstract
Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long-horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical manipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems. Project Page: https://yravan.github.io/plannerorderedpolicy/
