Table of Contents
Fetching ...

STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation

Yifan Duan, Heng Li, Yilong Wu, Wenhao Yu, Xinran Zhang, Yedong Shen, Jianmin Ji, Yanyong Zhang

TL;DR

STDArm tackles the challenge of deploying visuomotor policies trained on static data to dynamic mobile robots by introducing a real-time action correction pipeline. The system combines an action manager for high-frequency control, a lightweight pose-prediction stabilizer, and online latency estimation to compensate for platform motion and perception-action delays. Across multiple arms, platforms, and tasks, STDArm achieves centimeter-level precision and substantial performance gains in dynamic environments, validating its plug-and-play potential and edge-computing feasibility. This work enables cost-effective migration of static-trained policies to diverse mobile robots, enhancing robustness in real-world manipulation under motion disturbances.

Abstract

Recent advances in mobile robotic platforms like quadruped robots and drones have spurred a demand for deploying visuomotor policies in increasingly dynamic environments. However, the collection of high-quality training data, the impact of platform motion and processing delays, and limited onboard computing resources pose significant barriers to existing solutions. In this work, we present STDArm, a system that directly transfers policies trained under static conditions to dynamic platforms without extensive modifications. The core of STDArm is a real-time action correction framework consisting of: (1) an action manager to boost control frequency and maintain temporal consistency, (2) a stabilizer with a lightweight prediction network to compensate for motion disturbances, and (3) an online latency estimation module for calibrating system parameters. In this way, STDArm achieves centimeter-level precision in mobile manipulation tasks. We conduct comprehensive evaluations of the proposed STDArm on two types of robotic arms, four types of mobile platforms, and three tasks. Experimental results indicate that the STDArm enables real-time compensation for platform motion disturbances while preserving the original policy's manipulation capabilities, achieving centimeter-level operational precision during robot motion.

STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation

TL;DR

STDArm tackles the challenge of deploying visuomotor policies trained on static data to dynamic mobile robots by introducing a real-time action correction pipeline. The system combines an action manager for high-frequency control, a lightweight pose-prediction stabilizer, and online latency estimation to compensate for platform motion and perception-action delays. Across multiple arms, platforms, and tasks, STDArm achieves centimeter-level precision and substantial performance gains in dynamic environments, validating its plug-and-play potential and edge-computing feasibility. This work enables cost-effective migration of static-trained policies to diverse mobile robots, enhancing robustness in real-world manipulation under motion disturbances.

Abstract

Recent advances in mobile robotic platforms like quadruped robots and drones have spurred a demand for deploying visuomotor policies in increasingly dynamic environments. However, the collection of high-quality training data, the impact of platform motion and processing delays, and limited onboard computing resources pose significant barriers to existing solutions. In this work, we present STDArm, a system that directly transfers policies trained under static conditions to dynamic platforms without extensive modifications. The core of STDArm is a real-time action correction framework consisting of: (1) an action manager to boost control frequency and maintain temporal consistency, (2) a stabilizer with a lightweight prediction network to compensate for motion disturbances, and (3) an online latency estimation module for calibrating system parameters. In this way, STDArm achieves centimeter-level precision in mobile manipulation tasks. We conduct comprehensive evaluations of the proposed STDArm on two types of robotic arms, four types of mobile platforms, and three tasks. Experimental results indicate that the STDArm enables real-time compensation for platform motion disturbances while preserving the original policy's manipulation capabilities, achieving centimeter-level operational precision during robot motion.

Paper Structure

This paper contains 20 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: When deploying a policy trained on static data on computationally constrained edge devices, action drift may occur due to delays like inference and execution latency. Our system, STDArm, addresses this by direct rectification of actions, enabling reliable task completion without policy modifications.
  • Figure 2: Pipeline of STDArm. Given images and joint state observations, our system first generates a low-frequency action sequence via a policy network. These actions are then passed to action manager, which maintains an action buffer using temporal ensemble and generates high-frequency actions through interpolation. Subsequently, a stabilizer refines the action based on estimated latency and real-time pose predictions to compensate for the motion of the robot platform. Finally, the stabilized action is sent to the robotic arm for precise and stable manipulation.
  • Figure 3: By utilizing the pose at action generation, the predicted pose, and the extrinsic parameters between the robotic arm and the pose estimator as inputs, the stabilizer compensates for each action output by the action manager, whether derived from the policy network or interpolation, thereby counteracting platform motion.
  • Figure 4: Four experimental configurations we design. Each is equipped with two cameras for observation inputs and a T265 camera for pose estimation. The robotic arms operate independently of the robot platforms, with MT-3DoF, LS-3DoF and UAV-3DoF sharing the same arm.
  • Figure 5: We design three tasks of varying difficulty levels and conduct experiments across three experimental configurations.
  • ...and 3 more figures