Table of Contents
Fetching ...

Cognitive Manipulation: Semi-supervised Visual Representation and Classroom-to-real Reinforcement Learning for Assembly in Semi-structured Environments

Chuang Wang, Lie Yang, Ze Lin, Yizhi Liao, Gang Chen, Longhan Xie

TL;DR

The paper tackles fixture-free, high-precision robotic assembly in semi-structured environments, where purely end-to-end DRL struggles due to scarce priors and data. It introduces CM4RASSE, a cognitive manipulation framework built on a skill graph that integrates learning-based object detection with a residual fine-manipulation policy, guided by semi-supervised visual learning and classroom-to-real reinforcement learning. Key contributions include a neural-symbolic skill graph, semi-supervised object detection and calibration, and a curriculum-based residual RL approach that transfers from controlled classrooms to real semi-structured settings; simulations show a 13% gain in success rate and 15.4% fewer steps, with real tasks validating robustness. The work advances practical robotic assembly by reducing labeling effort, improving sample efficiency, and enabling knowledge-guided transfer to semi-structured environments, with potential extensions to offline/continual learning and language-guided reasoning.

Abstract

Assembling a slave object into a fixture-free master object represents a critical challenge in flexible manufacturing. Existing deep reinforcement learning-based methods, while benefiting from visual or operational priors, often struggle with small-batch precise assembly tasks due to their reliance on insufficient priors and high-costed model development. To address these limitations, this paper introduces a cognitive manipulation and learning approach that utilizes skill graphs to integrate learning-based object detection with fine manipulation models into a cohesive modular policy. This approach enables the detection of the master object from both global and local perspectives to accommodate positional uncertainties and variable backgrounds, and parametric residual policy to handle pose error and intricate contact dynamics effectively. Leveraging the skill graph, our method supports knowledge-informed learning of semi-supervised learning for object detection and classroom-to-real reinforcement learning for fine manipulation. Simulation experiments on a gear-assembly task have demonstrated that the skill-graph-enabled coarse-operation planning and visual attention are essential for efficient learning and robust manipulation, showing substantial improvements of 13$\%$ in success rate and 15.4$\%$ in number of completion steps over competing methods. Real-world experiments further validate that our system is highly effective for robotic assembly in semi-structured environments.

Cognitive Manipulation: Semi-supervised Visual Representation and Classroom-to-real Reinforcement Learning for Assembly in Semi-structured Environments

TL;DR

The paper tackles fixture-free, high-precision robotic assembly in semi-structured environments, where purely end-to-end DRL struggles due to scarce priors and data. It introduces CM4RASSE, a cognitive manipulation framework built on a skill graph that integrates learning-based object detection with a residual fine-manipulation policy, guided by semi-supervised visual learning and classroom-to-real reinforcement learning. Key contributions include a neural-symbolic skill graph, semi-supervised object detection and calibration, and a curriculum-based residual RL approach that transfers from controlled classrooms to real semi-structured settings; simulations show a 13% gain in success rate and 15.4% fewer steps, with real tasks validating robustness. The work advances practical robotic assembly by reducing labeling effort, improving sample efficiency, and enabling knowledge-guided transfer to semi-structured environments, with potential extensions to offline/continual learning and language-guided reasoning.

Abstract

Assembling a slave object into a fixture-free master object represents a critical challenge in flexible manufacturing. Existing deep reinforcement learning-based methods, while benefiting from visual or operational priors, often struggle with small-batch precise assembly tasks due to their reliance on insufficient priors and high-costed model development. To address these limitations, this paper introduces a cognitive manipulation and learning approach that utilizes skill graphs to integrate learning-based object detection with fine manipulation models into a cohesive modular policy. This approach enables the detection of the master object from both global and local perspectives to accommodate positional uncertainties and variable backgrounds, and parametric residual policy to handle pose error and intricate contact dynamics effectively. Leveraging the skill graph, our method supports knowledge-informed learning of semi-supervised learning for object detection and classroom-to-real reinforcement learning for fine manipulation. Simulation experiments on a gear-assembly task have demonstrated that the skill-graph-enabled coarse-operation planning and visual attention are essential for efficient learning and robust manipulation, showing substantial improvements of 13 in success rate and 15.4 in number of completion steps over competing methods. Real-world experiments further validate that our system is highly effective for robotic assembly in semi-structured environments.
Paper Structure (35 sections, 19 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 19 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: Robotic assembly tasks in a semi-structured environment and cognitive manipulation.
  • Figure 2: Robotic assembly in a semi-structured environment. (a) This work considers assembly tasks in a semi-structured deployment environment for simulating flexible manufacturing scenarios with a task board randomly placed in a predefined workspace. (b) We focus on the subtask of the robot assembly task, which uses vision and haptics to assemble the grasping slave object into the master object. (c) The eye-to-hand camera can see the whole workspace, but the assembly objects are hidden by the robot during the contact-rich manipulation phase. (d) The eye-in-hand camera can see the assembly objects clearly, but the limited field of view does not allow a continuous view of the entire work area.
  • Figure 3: The cognitive manipulation architecture with semi-supervised visual representation Learning and classroom-to-real reinforcement learning.
  • Figure 4: Components of cognitive manipulation for precise assembly tasks in a semi-structured environment. A skill graph divides the manipulation into multiple stages solved with multiple modules. The object detection estimates rough location ${}^BX_O$ and provides task-related features $I_{atten}$. The planner generates trajectory and stiffness as coarse operation policy $\pi_H$. In the free space, the impedance-controlled trajectory $\pi_H^{cf}$ moves the robot to the contact-rich $S_{cr}$ regions. Once the task is in the region, $\pi_H^{cr}$ is switched and the residual policy $\pi_\theta$ is enabled for fine alignment and contact dynamics in contact-rich manipulation.
  • Figure 5: The geometric information and temporal logic of manipulation.
  • ...and 9 more figures