Table of Contents
Fetching ...

KinScene: Model-Based Mobile Manipulation of Articulated Scenes

Cheng-Chun Hsu, Ben Abbatematteo, Zhenyu Jiang, Yuke Zhu, Roberto Martín-Martín, Joydeep Biswas

TL;DR

This study explores building scene-level articulation models for indoor scenes through autonomous exploration to enable long-horizon tasks involving articulated objects, and introduces KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects.

Abstract

Sequentially interacting with articulated objects is crucial for a mobile manipulator to operate effectively in everyday environments. To enable long-horizon tasks involving articulated objects, this study explores building scene-level articulation models for indoor scenes through autonomous exploration. While previous research has studied mobile manipulation with articulated objects by considering object kinematic constraints, it primarily focuses on individual-object scenarios and lacks extension to a scene-level context for task-level planning. To manipulate multiple object parts sequentially, the robot needs to reason about the resultant motion of each part and anticipate its impact on future actions. We introduce KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects. The robot maps the scene, detects and physically interacts with articulated objects, collects observations, and infers the articulation properties. For sequential tasks, the robot plans a feasible series of object interactions based on the inferred articulation model. We demonstrate that our approach repeatably constructs accurate scene-level kinematic and geometric models, enabling long-horizon mobile manipulation in a real-world scene. Code and additional results are available at https://chengchunhsu.github.io/KinScene/

KinScene: Model-Based Mobile Manipulation of Articulated Scenes

TL;DR

This study explores building scene-level articulation models for indoor scenes through autonomous exploration to enable long-horizon tasks involving articulated objects, and introduces KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects.

Abstract

Sequentially interacting with articulated objects is crucial for a mobile manipulator to operate effectively in everyday environments. To enable long-horizon tasks involving articulated objects, this study explores building scene-level articulation models for indoor scenes through autonomous exploration. While previous research has studied mobile manipulation with articulated objects by considering object kinematic constraints, it primarily focuses on individual-object scenarios and lacks extension to a scene-level context for task-level planning. To manipulate multiple object parts sequentially, the robot needs to reason about the resultant motion of each part and anticipate its impact on future actions. We introduce KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects. The robot maps the scene, detects and physically interacts with articulated objects, collects observations, and infers the articulation properties. For sequential tasks, the robot plans a feasible series of object interactions based on the inferred articulation model. We demonstrate that our approach repeatably constructs accurate scene-level kinematic and geometric models, enabling long-horizon mobile manipulation in a real-world scene. Code and additional results are available at https://chengchunhsu.github.io/KinScene/
Paper Structure (13 sections, 3 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 3 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: KinScene enables scene-level reasoning about articulated objects. In this scenario, attempting to open the dishwasher would obstruct the path for subsequent interactions. KinScene constructs a scene-level articulation model and plans a feasible trajectory.
  • Figure 2: KinScene System Overview. Our approach involves three stages: a mapping stage (left), where the robot conducts a 3D scan and detects handles; an articulation discovery stage (middle), where the robot navigates, interacts, collects observations, and estimates the scene-level articulation model; and a scene-level manipulation stage (right), where the robot plans tasks using the scene-level articulation model and executes long-horizon actions through the articulation planner.
  • Figure 3: Interaction through autonomous exploration vs. articulation planning. In the Articulation Discovery stage (green), the agent does not have information about the articulation and performs heuristic-based base positioning (in front of the interaction point) and manipulation (admittance controller with motion normal to the plane). In the Scene-Level Manipulation Stage (light red), the robot plans an efficient base positioning and an arm trajectory based on the estimated articulation model obtained during exploration. Thanks to the built model, KinScene achieves higher manipulation success and interacts faster than the baselines (Sec. \ref{['ss:soam']}).
  • Figure 4: Our mobile manipulator and the indoor kitchen scene. (Left) We integrate a custom omnidirectional base with a 7 DoF torque-controlled arm and a 2-fingered hand, combined with two RGB-D and one LiDAR sensors. (Right) Our kitchen environment contains 7 degrees of freedom including revolute and prismatic in objects of different shapes, weights and heights.
  • Figure 5: Results of Scene-level Manipulation.KinScene enables planning sequences of articulated object interactions. The lower left figure illustrates an example of part collision where the rightmost cabinet blocks actuation of the middle cabinet. The lower right figure illustrates the path blocking failure, where actuating the cabinet first blocks the robot's path to other objects in the plan.
  • ...and 2 more figures