REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly
Daniel Sliwowski, Shail Jadav, Sergej Stanovcic, Jedrzej Orbik, Johannes Heidersberger, Dongheui Lee
TL;DR
REASSEMBLE addresses the scarcity of long-horizon, contact-rich robotic manipulation datasets by introducing a multimodal collection built on NIST Task Board #1. It combines four actions across 17 objects with 4,551 demonstrations and 781 minutes of data from RGB cameras, event cameras, microphones, a 6-axis force-torque sensor, and robot proprioception, plus motion capture-based localization. The dataset supports multi-task benchmarks in hierarchical temporal action segmentation, motion policy learning, and anomaly detection, and demonstrates baseline results including a DiffAct TAS evaluation and a DMP-based MPL framework conditioned on language and goals, with an anomaly monitoring pipeline using ConditionNET. The work highlights the practical impact of rich multi-modal sensing for robust, generalizable contact-rich manipulation in both assembly and disassembly tasks, and provides open data and tools to accelerate future research in this domain, including a visualization suite and HDF5 data structure for easy integration. The inclusion of event camera data and detailed teleoperation metadata positions REASSEMBLE as a valuable resource for researchers aiming to model and learn from complex interaction dynamics under real-world uncertainties.
Abstract
Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data, including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and task inversion learning. The REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios. The dataset is publicly available on our project website: https://tuwien-asl.github.io/REASSEMBLE_page/.
