Table of Contents
Fetching ...

Partial Motion Imitation for Learning Cart Pushing with Legged Manipulators

Mili Das, Morgan Byrd, Donghoon Baek, Sehoon Ha

Abstract

Loco-manipulation is a key capability for legged robots to perform practical mobile manipulation tasks, such as transporting and pushing objects, in real-world environments. However, learning robust loco-manipulation skills remains challenging due to the difficulty of maintaining stable locomotion while simultaneously performing precise manipulation behaviors. This work proposes a partial imitation learning approach that transfers the locomotion style learned from a locomotion task to cart loco-manipulation. A robust locomotion policy is first trained with extensive domain and terrain randomization, and a loco-manipulation policy is then learned by imitating only lower-body motions using a partial adversarial motion prior. We conduct experiments demonstrating that the learned policy successfully pushes a cart along diverse trajectories in IsaacLab and transfers effectively to MuJoCo. We also compare our method to several baselines and show that the proposed approach achieves more stable and accurate loco-manipulation behaviors.

Partial Motion Imitation for Learning Cart Pushing with Legged Manipulators

Abstract

Loco-manipulation is a key capability for legged robots to perform practical mobile manipulation tasks, such as transporting and pushing objects, in real-world environments. However, learning robust loco-manipulation skills remains challenging due to the difficulty of maintaining stable locomotion while simultaneously performing precise manipulation behaviors. This work proposes a partial imitation learning approach that transfers the locomotion style learned from a locomotion task to cart loco-manipulation. A robust locomotion policy is first trained with extensive domain and terrain randomization, and a loco-manipulation policy is then learned by imitating only lower-body motions using a partial adversarial motion prior. We conduct experiments demonstrating that the learned policy successfully pushes a cart along diverse trajectories in IsaacLab and transfers effectively to MuJoCo. We also compare our method to several baselines and show that the proposed approach achieves more stable and accurate loco-manipulation behaviors.

Paper Structure

This paper contains 16 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Cart loco-manipulation along a predefined path (green dotted curve). The robot initiates contact at (1), performs straight pushing and coordinated turning maneuvers (2–4), continuously adjusting whole-body coordination to maintain stable contact and balance while tracking the target trajectory.
  • Figure 2: Architecture of the proposed framework. Training is performed in two stages. In stage 1, a robust locomotion reference policy $\pi_{\text{ref}}$ is trained on rough terrain. The resulting policy generates projected state transitions $\mathcal{D}_{\text{ref}}=\{s_t, s_{t+1}\}$ that encode stable locomotion behaviors. In stage 2, we train the loco-manipulation policy using partial adversarial motion priors, where $\mathcal{D}_{\text{ref}}$ is used as the reference data for the discriminator to provide a style reward. By imitating only the lower-body motions, the policy maintains stable locomotion while allowing flexible manipulation for cart pushing.
  • Figure 3: Sim-to-Sim Qualitative Comparison. Depicted on the left is the policy deployed in the source simulator, IsaacSim, and on the right is the target simulator, MuJoCo. Our policy, partial AMP, demonstrates robustness when transferred, showcasing steady locomotion and end effector stability. On the other hand, the baselines are shown with their most common failure points. The No AMP baseline demonstrates feet slip, causing the head to collide with the ground. The Full AMP baseline demonstrates overfitting to the reference manipulator position instead of focusing on end effector tracking, which causes the end effector to slip off the handle. In hierarchical AMP, the robot adopts a conservative stance to prevent falling, resulting in high velocity tracking errors due to the lack of cart pushing.
  • Figure 4: Robustness evaluation across environmental variations. The shaded regions represent the parameter space where each policy satisfies the success criteria (survival rate $\ge 30\%$ and linear velocity error $\le 0.5$).
  • Figure 5: Individual runs of path-following performance on three representative trajectories. The mean cross-track error (averaged over five runs) is 0.1238 m for BeatPath, 0.0112 m for RiverPath, and 0.0885 m for SincPath, demonstrating accurate path tracking across different trajectory shapes. Blue arrows indicate the instantaneous velocity direction along the motion.