UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation

Kangchen Lv; Mingrui Yu; Shihefeng Wang; Xiangyang Ji; Xiang Li

UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation

Kangchen Lv, Mingrui Yu, Shihefeng Wang, Xiangyang Ji, Xiang Li

TL;DR

Deformable linear objects present severe perception challenges under occlusion, hindering reliable manipulation. UniStateDLO addresses this with a unified diffusion-based framework that performs both single-frame state estimation and cross-frame tracking, using a two-branch architecture to fuse global robustness with local geometric precision. Trained purely on synthetic data, it generalizes zero-shot to real DLOs and delivers real-time, temporally coherent reconstructions that enable stable closed-loop manipulation in constrained environments. Extensive simulations and real-world experiments demonstrate superior occlusion robustness and tracking stability compared with state-of-the-art baselines, establishing UniStateDLO as a strong front-end perception module for DLO manipulation.

Abstract

Perception of deformable linear objects (DLOs), such as cables, ropes, and wires, is the cornerstone for successful downstream manipulation. Although vision-based methods have been extensively explored, they remain highly vulnerable to occlusions that commonly arise in constrained manipulation environments due to surrounding obstacles, large and varying deformations, and limited viewpoints. Moreover, the high dimensionality of the state space, the lack of distinctive visual features, and the presence of sensor noises further compound the challenges of reliable DLO perception. To address these open issues, this paper presents UniStateDLO, the first complete DLO perception pipeline with deep-learning methods that achieves robust performance under severe occlusion, covering both single-frame state estimation and cross-frame state tracking from partial point clouds. Both tasks are formulated as conditional generative problems, leveraging the strong capability of diffusion models to capture the complex mapping between highly partial observations and high-dimensional DLO states. UniStateDLO effectively handles a wide range of occlusion patterns, including initial occlusion, self-occlusion, and occlusion caused by multiple objects. In addition, it exhibits strong data efficiency as the entire network is trained solely on a large-scale synthetic dataset, enabling zero-shot sim-to-real generalization without any real-world training data. Comprehensive simulation and real-world experiments demonstrate that UniStateDLO outperforms all state-of-the-art baselines in both estimation and tracking, producing globally smooth yet locally precise DLO state predictions in real time, even under substantial occlusions. Its integration as the front-end module in a closed-loop DLO manipulation system further demonstrates its ability to support stable feedback control in complex, constrained 3-D environments.

UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation

TL;DR

Abstract

UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)