Table of Contents
Fetching ...

REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

Daniel Sliwowski, Shail Jadav, Sergej Stanovcic, Jedrzej Orbik, Johannes Heidersberger, Dongheui Lee

TL;DR

REASSEMBLE addresses the scarcity of long-horizon, contact-rich robotic manipulation datasets by introducing a multimodal collection built on NIST Task Board #1. It combines four actions across 17 objects with 4,551 demonstrations and 781 minutes of data from RGB cameras, event cameras, microphones, a 6-axis force-torque sensor, and robot proprioception, plus motion capture-based localization. The dataset supports multi-task benchmarks in hierarchical temporal action segmentation, motion policy learning, and anomaly detection, and demonstrates baseline results including a DiffAct TAS evaluation and a DMP-based MPL framework conditioned on language and goals, with an anomaly monitoring pipeline using ConditionNET. The work highlights the practical impact of rich multi-modal sensing for robust, generalizable contact-rich manipulation in both assembly and disassembly tasks, and provides open data and tools to accelerate future research in this domain, including a visualization suite and HDF5 data structure for easy integration. The inclusion of event camera data and detailed teleoperation metadata positions REASSEMBLE as a valuable resource for researchers aiming to model and learn from complex interaction dynamics under real-world uncertainties.

Abstract

Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data, including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and task inversion learning. The REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios. The dataset is publicly available on our project website: https://tuwien-asl.github.io/REASSEMBLE_page/.

REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

TL;DR

REASSEMBLE addresses the scarcity of long-horizon, contact-rich robotic manipulation datasets by introducing a multimodal collection built on NIST Task Board #1. It combines four actions across 17 objects with 4,551 demonstrations and 781 minutes of data from RGB cameras, event cameras, microphones, a 6-axis force-torque sensor, and robot proprioception, plus motion capture-based localization. The dataset supports multi-task benchmarks in hierarchical temporal action segmentation, motion policy learning, and anomaly detection, and demonstrates baseline results including a DiffAct TAS evaluation and a DMP-based MPL framework conditioned on language and goals, with an anomaly monitoring pipeline using ConditionNET. The work highlights the practical impact of rich multi-modal sensing for robust, generalizable contact-rich manipulation in both assembly and disassembly tasks, and provides open data and tools to accelerate future research in this domain, including a visualization suite and HDF5 data structure for easy integration. The inclusion of event camera data and detailed teleoperation metadata positions REASSEMBLE as a valuable resource for researchers aiming to model and learn from complex interaction dynamics under real-world uncertainties.

Abstract

Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data, including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and task inversion learning. The REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios. The dataset is publicly available on our project website: https://tuwien-asl.github.io/REASSEMBLE_page/.

Paper Structure

This paper contains 21 sections, 13 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Overview of the sensor placement. We use two external and one wrist-mounted RGB cameras (marked in orange). Additionally, we use an externally mounted event camera (in blue), three microphones (in yellow), and one wrist-mounted force/torque (F/T) sensor (in red). The omega.6 haptic teleoperation device is also visible.
  • Figure 2: Visualization of event camera data. In this example, a peg becomes stuck after insertion, and the robot applies a nudge to properly insert it. (a) shows a snapshot of the event stream before the nudge, and (b) after. The motion of the peg is clearly visible in the event camera stream.
  • Figure 3: Overview of the teleoperation control system. The operator controls the robot's motion through the haptic device, which simultaneously feeds back forces measured at the robot's end effector.
  • Figure 4: Sankey diagram showing the hierarchical structure and how skills are distributed within actions. The REASSEMBLE dataset contains 121 unique skill-object pairs.
  • Figure 5: Number of demonstrations of each action-object pair. In REASSEMBLE, we have 4 actions: pick, insert, remove, and place, and 17 objects, resulting in 68 unique action-object pairs. The number of executions of each unique action is almost equal, making it a balanced dataset.
  • ...and 9 more figures