Table of Contents
Fetching ...

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

Xi Wang, Tianxing Chen, Qiaojun Yu, Tianling Xu, Zanxin Chen, Yiting Fu, Ziqi He, Cewu Lu, Yao Mu, Ping Luo

TL;DR

This paper addresses articulated object manipulation by online axis estimation integrated with SAM2-based tracking. It proposes a closed-loop pipeline that uses interactive perception to induce motion, SAM2 for segmentation, and online axis estimation from moving-part point clouds to guide manipulation. The method defines axis types (prismatic and revolute) and refines axis estimates over time via a sliding window, improving precision and robustness over open-loop baselines. Experimental results in simulation and real-world deployment show significant improvements in axis-based manipulation tasks such as door and drawer opening, demonstrating practical applicability and generalization. This approach advances perception-action coupling for articulated object manipulation.

Abstract

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/.

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

TL;DR

This paper addresses articulated object manipulation by online axis estimation integrated with SAM2-based tracking. It proposes a closed-loop pipeline that uses interactive perception to induce motion, SAM2 for segmentation, and online axis estimation from moving-part point clouds to guide manipulation. The method defines axis types (prismatic and revolute) and refines axis estimates over time via a sliding window, improving precision and robustness over open-loop baselines. Experimental results in simulation and real-world deployment show significant improvements in axis-based manipulation tasks such as door and drawer opening, demonstrating practical applicability and generalization. This approach advances perception-action coupling for articulated object manipulation.

Abstract

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/.
Paper Structure (18 sections, 2 equations, 5 figures, 3 tables)

This paper contains 18 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: We introduces a closed-loop pipeline integrating SAM2 to track the articulated object throughout the manipulation process captured by an RGB-D camera. The masks are subsequently used to segment out the point cloud of the articulated object, followed by explicitly estimating the joint axis using the oriented bounding boxes (OBBs) of the motion part derived from the point cloud. The robot's action is instructed under the guidance of the motion axis to perform further manipulation for the next loop.
  • Figure 2: In our pipeline, an RGB-D camera captures the dynamic scene, which is induced by the slight movement from the Interactive Perception & Init-Manipulation Module. The captured scene is then processed by the Tracking & Segmentation Module, which tracks and segments the moving part of the articulated object at a 3D level. This segmented data is subsequently passed to the Axis Estimation & Manipulation Module. Here, the motion axis is explicitly calculated, providing informed guidance for the robot's manipulation policy.
  • Figure 3: Visualization of axis estimation for real-world "Open Drawer" and "Open Door" tasks. The initial moment and the three manipulation moments are shown, with visualization of the RGB tracking obtained from SAM2 (background), the reconstructed point cloud of the target object (bottom-left corner), the OBBs (green dashed-line boxes at $t_0$) and the axis (red arrows) estimated with our method.
  • Figure 4: Success rate of more challenging tasks for opening door.
  • Figure 5: Success rate of more challenging tasks for opening drawer.