ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Zhengrong Xue; Han Zhang; Jingwen Cheng; Zhengmao He; Yuanchen Ju; Changyi Lin; Gu Zhang; Huazhe Xu

ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Zhengrong Xue, Han Zhang, Jingwen Cheng, Zhengmao He, Yuanchen Ju, Changyi Lin, Gu Zhang, Huazhe Xu

TL;DR

ArrayBot introduces a scalable, distributed tabletop manipulator composed of a $16 \times 16$ pillar array equipped with tactile sensing and a learned control policy. By reshaping the action space into a $5 \times 5$ Local Action Patch and a $2$-D DCT frequency-domain representation with low-frequency channels, the authors enable model-free RL to learn tactile-only manipulation policies that generalize to unseen shapes and transfer to the real robot without domain randomization. The study demonstrates lifting, flipping, and a general relocate-via-touch task in simulation, with zero-shot sim-to-real transfer achieving $74\%$ success on unseen objects in physical experiments. Derived skills such as trajectory following, parallel manipulation, and resilience to visual degradation illustrate the practical potential of RL for distributed manipulation in both industrial and household settings.

Abstract

We present ArrayBot, a distributed manipulation system consisting of a $16 \times 16$ array of vertically sliding pillars integrated with tactile sensors, which can simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Surprisingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also transfer to the physical robot without any domain randomization. Leveraging the deployed policy, we present abundant real-world manipulation tasks, illustrating the vast potential of RL on ArrayBot for distributed manipulation.

ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

TL;DR

ArrayBot introduces a scalable, distributed tabletop manipulator composed of a

pillar array equipped with tactile sensing and a learned control policy. By reshaping the action space into a

Local Action Patch and a

-D DCT frequency-domain representation with low-frequency channels, the authors enable model-free RL to learn tactile-only manipulation policies that generalize to unseen shapes and transfer to the real robot without domain randomization. The study demonstrates lifting, flipping, and a general relocate-via-touch task in simulation, with zero-shot sim-to-real transfer achieving

success on unseen objects in physical experiments. Derived skills such as trajectory following, parallel manipulation, and resilience to visual degradation illustrate the practical potential of RL for distributed manipulation in both industrial and household settings.

Abstract

We present ArrayBot, a distributed manipulation system consisting of a

array of vertically sliding pillars integrated with tactile sensors, which can simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Surprisingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also transfer to the physical robot without any domain randomization. Leveraging the deployed policy, we present abundant real-world manipulation tasks, illustrating the vast potential of RL on ArrayBot for distributed manipulation.

Paper Structure (24 sections, 11 figures, 4 tables)

This paper contains 24 sections, 11 figures, 4 tables.

Introduction
A Sketch for the Hardware Design
Action Space Reshaping
Learning the Control Policies
Simulator Setup
Environments
Training the RL Agents
Simulated Experiments for Lifting and Flipping
Simulated Experiments for General Relocate-via-Touch
Deploying the Control Policies
Zero-Shot Sim-to-Real Transfer
Experiments on the Physical Robot
Derived Real-World Manipulation Skills
Related Works
Discussion on Potential Applications
...and 9 more sections

Figures (11)

Figure 1: We present ArrayBot, a distributed manipulation system. With the aim of generalizable manipulation, we train RL agents on the simulated ArrayBot where the only accessible observation is the tactile information. Afterwards, we deploy the learned control policy to the physical robot, and showcase the bird's-eye view of the trajectories for real-world manipulation tasks: relocating novel-shaped objects, manipulating two objects in parallel, trajectory following, and manipulation under visual degradations. Please refer to the videos on our https://steven-xzr.github.io/ArrayBot/.
Figure 2: The hardware of ArrayBot is a $16 \times 16$ array of vertically sliding pillars. (a) The exploded view of an atom unit, which consists of the actuator, the pillar, and the end-effector. (b) Every two atom units are assembled with one STM32 board as a modular unit.
Figure 3: The visualization of a $5 \times 5$ 2D DCT map. We select the lowest $6$ frequency channels marked in green.
Figure 4: An overview of the RL framework on ArrayBot for general relocate-via-touch. The state is the combination of the estimated object position, the specified target position, the residual goal direction, and the robot state in the frequency domain. Exempt from any visual inputs, the states are inferred from purely proprioceptive observations of the robot joint configuration and the tactile sensor array.
Figure 5: (a) The training curves in terms of episode returns and survival steps. The results are averaged on $5$ seeds. The shaded area stands for the standard deviation. (b)(c) The example trajectories of the policies learned by (b) LAP+DCT-6 and (c) LAP only for flipping.
...and 6 more figures

ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

TL;DR

Abstract

ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

Authors

TL;DR

Abstract

Table of Contents

Figures (11)