RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Kian Behzad; Rojin Zandi; Elaheh Motamedi; Hojjat Salehinejad; Milad Siami

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Kian Behzad, Rojin Zandi, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami

TL;DR

A novel dataset for multi-robot activity recognition (MRAR) using two robotic arms integrating WiFi channel state information (CSI), video, and audio data is introduced, offering significant potential aiming to advance robotic perception and autonomous systems.

Abstract

We introduce a novel dataset for multi-robot activity recognition (MRAR) using two robotic arms integrating WiFi channel state information (CSI), video, and audio data. This multimodal dataset utilizes signals of opportunity, leveraging existing WiFi infrastructure to provide detailed indoor environmental sensing without additional sensor deployment. Data were collected using two Franka Emika robotic arms, complemented by three cameras, three WiFi sniffers to collect CSI, and three microphones capturing distinct yet complementary audio data streams. The combination of CSI, visual, and auditory data can enhance robustness and accuracy in MRAR. This comprehensive dataset enables a holistic understanding of robotic environments, facilitating advanced autonomous operations that mimic human-like perception and interaction. By repurposing ubiquitous WiFi signals for environmental sensing, this dataset offers significant potential aiming to advance robotic perception and autonomous systems. It provides a valuable resource for developing sophisticated decision-making and adaptive capabilities in dynamic environments.

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

TL;DR

Abstract

Paper Structure (29 sections, 3 equations, 23 figures, 8 tables)

This paper contains 29 sections, 3 equations, 23 figures, 8 tables.

Background & Summary
Methods
Data Records
Technical Validation
Code availability
Acknowledgements
Author Contributions
Competing interests

Figures (23)

Figure 1: Floor plan of the data collection environment, where the robots are performing different activities while various sensors mounted on our sensor-rich modules capture data.
Figure 2: Sensor-rich module used in the data collection setup consisting of a Raspberry Pi to capture CSI, a stereo camera to capture video, and a microphone to capture audio.
Figure 3: Denavit-Hartenberg Parameters DH9646185 and image frankaemika2024 of Franka Emika robotic arm, with $q_i$ being the joint angle of the $i$th revolute joint.
Figure 4: Numbers $0$ through $9$ are drawn by the robotic arm on a vertical imaginary plane, resulting in $10$ distinct classes of activities. The plot shows the end effector trajectories that form these numbers, with the robotic arm and background removed for clarity. For illustration purposes the initial and final parts of the robot's trajectory, where the robot positions itself from its starting point to the imaginary plane and back, are omitted.
Figure 5: Each activity involves writing a digit using a robotic arm, defined by seven distinct waypoints. The left plot illustrates the waypoints in the end-effector space, while the right plot represents them in the joint space. Each digit is written by following a specific sequence of waypoints in the joint space, with the robot always starting and ending at waypoint 1. For example, to write the digit $7$, the robot follows the sequence $\{1, 2, 5, 7, 1\}$.
...and 18 more figures

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

TL;DR

Abstract

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Authors

TL;DR

Abstract

Table of Contents

Figures (23)