Cognitive Manipulation: Semi-supervised Visual Representation and Classroom-to-real Reinforcement Learning for Assembly in Semi-structured Environments
Chuang Wang, Lie Yang, Ze Lin, Yizhi Liao, Gang Chen, Longhan Xie
TL;DR
The paper tackles fixture-free, high-precision robotic assembly in semi-structured environments, where purely end-to-end DRL struggles due to scarce priors and data. It introduces CM4RASSE, a cognitive manipulation framework built on a skill graph that integrates learning-based object detection with a residual fine-manipulation policy, guided by semi-supervised visual learning and classroom-to-real reinforcement learning. Key contributions include a neural-symbolic skill graph, semi-supervised object detection and calibration, and a curriculum-based residual RL approach that transfers from controlled classrooms to real semi-structured settings; simulations show a 13% gain in success rate and 15.4% fewer steps, with real tasks validating robustness. The work advances practical robotic assembly by reducing labeling effort, improving sample efficiency, and enabling knowledge-guided transfer to semi-structured environments, with potential extensions to offline/continual learning and language-guided reasoning.
Abstract
Assembling a slave object into a fixture-free master object represents a critical challenge in flexible manufacturing. Existing deep reinforcement learning-based methods, while benefiting from visual or operational priors, often struggle with small-batch precise assembly tasks due to their reliance on insufficient priors and high-costed model development. To address these limitations, this paper introduces a cognitive manipulation and learning approach that utilizes skill graphs to integrate learning-based object detection with fine manipulation models into a cohesive modular policy. This approach enables the detection of the master object from both global and local perspectives to accommodate positional uncertainties and variable backgrounds, and parametric residual policy to handle pose error and intricate contact dynamics effectively. Leveraging the skill graph, our method supports knowledge-informed learning of semi-supervised learning for object detection and classroom-to-real reinforcement learning for fine manipulation. Simulation experiments on a gear-assembly task have demonstrated that the skill-graph-enabled coarse-operation planning and visual attention are essential for efficient learning and robust manipulation, showing substantial improvements of 13$\%$ in success rate and 15.4$\%$ in number of completion steps over competing methods. Real-world experiments further validate that our system is highly effective for robotic assembly in semi-structured environments.
