Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning
Hao Chen, Weiwei Wan, Masaki Matsushita, Takeyuki Kotaka, Kensuke Harada
TL;DR
This work tackles multi-class in-rack test tube rearrangement by coupling task-level reinforcement learning with motion planning in a closed loop. Task planning uses specialist Dueling Double Deep Q Networks (D3QN) within a distributed ApeX-style framework, augmented by A${}^ imes$-based post-processing to amplify data and improve convergence. Motion planning handles grasp reasoning, shared grasp poses, and RRT-Connect-based execution, while maintaining per-slot condition sets to enable replanning after failures. The approach is validated through simulations and real-world ABB/Yumi experiments, showing superior robustness and efficiency relative to traditional A${}^ imes$-based task planners, with practical resilience to sensing and control perturbations. The framework supports sensory feedback such as vision and force/torque and maintains extensibility for broader rearrangement tasks.
Abstract
A combined task-level reinforcement learning and motion planning framework is proposed in this paper to address a multi-class in-rack test tube rearrangement problem. At the task level, the framework uses reinforcement learning to infer a sequence of swap actions while ignoring robotic motion details. At the motion level, the framework accepts the swapping action sequences inferred by task-level agents and plans the detailed robotic pick-and-place motion. The task and motion-level planning form a closed loop with the help of a condition set maintained for each rack slot, which allows the framework to perform replanning and effectively find solutions in the presence of low-level failures. Particularly for reinforcement learning, the framework leverages a distributed deep Q-learning structure with the Dueling Double Deep Q Network (D3QN) to acquire near-optimal policies and uses an A${}^\star$-based post-processing technique to amplify the collected training data. The D3QN and distributed learning help increase training efficiency. The post-processing helps complete unfinished action sequences and remove redundancy, thus making the training data more effective. We carry out both simulations and real-world studies to understand the performance of the proposed framework. The results verify the performance of the RL and post-processing and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedback. The real-world studies also demonstrated the incorporation.
