LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement
Xu Yin, Min-Sung Yoon, Yuchi Huo, Kang Zhang, Sung-Eui Yoon
TL;DR
LangPert tackles the challenge of Task-Level Perturbations in tabletop object rearrangement by integrating a Vision Language Model for global task monitoring with a Hierarchical Chain-of-Thought powered LLM planner and a language-conditioned low-level policy. The framework uses dual-view perception to detect both execution outcomes and perturbations, and applies HCoT reasoning to generate adaptive corrective plans, enabling robust re-planning in dynamic environments. Empirical results show LangPert achieves higher task completion rates and greater efficiency than baselines across ADD, RMV, and DIS perturbations, with strong generalization to unseen scenarios. The approach offers a practical route toward robust autonomous rearrangement in unstructured settings, with future work oriented toward real-world deployment and richer sensing modalities.
Abstract
Task execution for object rearrangement could be challenged by Task-Level Perturbations (TLP), i.e., unexpected object additions, removals, and displacements that can disrupt underlying visual policies and fundamentally compromise task feasibility and progress. To address these challenges, we present LangPert, a language-based framework designed to detect and mitigate TLP situations in tabletop rearrangement tasks. LangPert integrates a Visual Language Model (VLM) to comprehensively monitor policy's skill execution and environmental TLP, while leveraging the Hierarchical Chain-of-Thought (HCoT) reasoning mechanism to enhance the Large Language Model (LLM)'s contextual understanding and generate adaptive, corrective skill-execution plans. Our experimental results demonstrate that LangPert handles diverse TLP situations more effectively than baseline methods, achieving higher task completion rates, improved execution efficiency, and potential generalization to unseen scenarios.
