Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

Hao Chen; Weiwei Wan; Masaki Matsushita; Takeyuki Kotaka; Kensuke Harada

Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

Hao Chen, Weiwei Wan, Masaki Matsushita, Takeyuki Kotaka, Kensuke Harada

TL;DR

This work tackles multi-class in-rack test tube rearrangement by coupling task-level reinforcement learning with motion planning in a closed loop. Task planning uses specialist Dueling Double Deep Q Networks (D3QN) within a distributed ApeX-style framework, augmented by A${}^ imes$-based post-processing to amplify data and improve convergence. Motion planning handles grasp reasoning, shared grasp poses, and RRT-Connect-based execution, while maintaining per-slot condition sets to enable replanning after failures. The approach is validated through simulations and real-world ABB/Yumi experiments, showing superior robustness and efficiency relative to traditional A${}^ imes$-based task planners, with practical resilience to sensing and control perturbations. The framework supports sensory feedback such as vision and force/torque and maintains extensibility for broader rearrangement tasks.

Abstract

A combined task-level reinforcement learning and motion planning framework is proposed in this paper to address a multi-class in-rack test tube rearrangement problem. At the task level, the framework uses reinforcement learning to infer a sequence of swap actions while ignoring robotic motion details. At the motion level, the framework accepts the swapping action sequences inferred by task-level agents and plans the detailed robotic pick-and-place motion. The task and motion-level planning form a closed loop with the help of a condition set maintained for each rack slot, which allows the framework to perform replanning and effectively find solutions in the presence of low-level failures. Particularly for reinforcement learning, the framework leverages a distributed deep Q-learning structure with the Dueling Double Deep Q Network (D3QN) to acquire near-optimal policies and uses an A${}^\star$-based post-processing technique to amplify the collected training data. The D3QN and distributed learning help increase training efficiency. The post-processing helps complete unfinished action sequences and remove redundancy, thus making the training data more effective. We carry out both simulations and real-world studies to understand the performance of the proposed framework. The results verify the performance of the RL and post-processing and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedback. The real-world studies also demonstrated the incorporation.

Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

TL;DR

-based post-processing to amplify data and improve convergence. Motion planning handles grasp reasoning, shared grasp poses, and RRT-Connect-based execution, while maintaining per-slot condition sets to enable replanning after failures. The approach is validated through simulations and real-world ABB/Yumi experiments, showing superior robustness and efficiency relative to traditional A

-based task planners, with practical resilience to sensing and control perturbations. The framework supports sensory feedback such as vision and force/torque and maintains extensibility for broader rearrangement tasks.

Abstract

-based post-processing technique to amplify the collected training data. The D3QN and distributed learning help increase training efficiency. The post-processing helps complete unfinished action sequences and remove redundancy, thus making the training data more effective. We carry out both simulations and real-world studies to understand the performance of the proposed framework. The results verify the performance of the RL and post-processing and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedback. The real-world studies also demonstrated the incorporation.

Paper Structure (54 sections, 16 equations, 20 figures, 4 tables, 2 algorithms)

This paper contains 54 sections, 16 equations, 20 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Rearrangement Planning
Conventional AI-Based Methods
Learning-Based Methods
Heuristics and Data Relabeling in RL
Heuristic Acceleration in RL
Pre-training on heuristically generated datasets
Reward shaping using heuristics
Heuristically guided exploration
Data Relabeling in RL
Problem Definition and Solution Overview
In-Rack Test Tube Arrangement Problem
Overview of the Proposed Framework
Reinforcement Learning at the Task Level
...and 39 more sections

Figures (20)

Figure 1: (a) An example of the goal pattern arrangement $\Lambda_\mathrm{goal}$ for a rack with $5\times10$ slots. $n_r$, $n_c$, $n_t$ denote the number of rows, columns and tube types, respectively. The types of test tubes are differentiated by color. (b) The objective of the multi-class in-rack test tube rearrangement problem is to transfer test tubes from an initial arrangement $\Lambda_\mathrm{init}$ to an target arrangement $\Lambda \in \Gamma(\Lambda_\mathrm{goal})$ using a minimal number of pick-and-place actions.
Figure 2: Flowchart of the proposed framework.
Figure 3: "Arrangement" refers to the arrangement of test tubes in a rack. "State" is a "Matrix Representation" that encodes the rack arrangement.
Figure 4: All potential actions for a rack with $2\times2$ slots. An action is defined as a swap between two slots without discerning the start and goal. Each distinct swap is uniquely encoded as a one-hot vector $\bar{\mathbf{a}}$ and can either represent moving a tube from a first slot to a second one or vice versa.
Figure 5: (a.1)-(f.1) are 6 different conditions. A test tube is considered acceptable by a gripper if one of these conditions is satisfied. The white circle in the center represents the tube that is going to be picked. The neighboring empty grids mean that the slot must be empty so that fingers can be positioned there without collision. There are no requirements on the grids with purple circles. They could either be filled with obstacle tubes or empty. (a.2)-(f.2) are corresponding collision-free grasp poses for each condition. The test tubes in pink are considered to be obstacle test tubes. And the test tube in white is the test tube going to be manipulated.
...and 15 more figures

Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

TL;DR

Abstract

Robotic Test Tube Rearrangement Using Combined Reinforcement Learning and Motion Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (20)