Table of Contents
Fetching ...

The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization

Ilayda Yaman, Guoda Tian, Martin Larsson, Patrik Persson, Michiel Sandra, Alexander Dürr, Erik Tegler, Nikhil Challa, Henrik Garde, Fredrik Tufvesson, Kalle Åström, Ove Edfors, Steffen Malkowsky, Liang Liu

TL;DR

The LuViRA dataset addresses the challenge of indoor localization by providing a synchronized, multimodal resource that combines vision, 5G radio, and audio data with precise 6DoF ground truth. It details the measurement setup, calibration, synchronization, trajectory design (grid and random), and baseline validation across modalities, demonstrating cm-scale localization potential. By making the data publicly available, LuViRA enables robust sensor fusion benchmarking and supports research in low-power, real-time localization and related 5G and audio-visual applications. The dataset thus offers a comprehensive platform for evaluating multisensory fusion algorithms in controlled indoor environments.

Abstract

We present a synchronized multisensory dataset for accurate and robust indoor localization: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes color images, corresponding depth maps, inertial measurement unit (IMU) readings, channel response between a 5G massive multiple-input and multiple-output (MIMO) testbed and user equipment, audio recorded by 12 microphones, and accurate six degrees of freedom (6DOF) pose ground truth of 0.5 mm. We synchronize these sensors to ensure that all data is recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot, and 89 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. Data from different sensors can be used separately or jointly to perform localization tasks, and data from the motion capture (mocap) system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on sensor fusion with the most commonly used sensors for localization tasks. Moreover, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Our dataset is available at: https://github.com/ilaydayaman/LuViRA_Dataset

The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization

TL;DR

The LuViRA dataset addresses the challenge of indoor localization by providing a synchronized, multimodal resource that combines vision, 5G radio, and audio data with precise 6DoF ground truth. It details the measurement setup, calibration, synchronization, trajectory design (grid and random), and baseline validation across modalities, demonstrating cm-scale localization potential. By making the data publicly available, LuViRA enables robust sensor fusion benchmarking and supports research in low-power, real-time localization and related 5G and audio-visual applications. The dataset thus offers a comprehensive platform for evaluating multisensory fusion algorithms in controlled indoor environments.

Abstract

We present a synchronized multisensory dataset for accurate and robust indoor localization: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes color images, corresponding depth maps, inertial measurement unit (IMU) readings, channel response between a 5G massive multiple-input and multiple-output (MIMO) testbed and user equipment, audio recorded by 12 microphones, and accurate six degrees of freedom (6DOF) pose ground truth of 0.5 mm. We synchronize these sensors to ensure that all data is recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot, and 89 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. Data from different sensors can be used separately or jointly to perform localization tasks, and data from the motion capture (mocap) system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on sensor fusion with the most commonly used sensors for localization tasks. Moreover, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Our dataset is available at: https://github.com/ilaydayaman/LuViRA_Dataset
Paper Structure (15 sections, 8 figures, 2 tables)

This paper contains 15 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A photo of the sensors and the ground truth system depicting labels for the different components and the bottom-left corner focuses on the devices on the robot.
  • Figure 2: Top-down view of the studio, containing twelve microphones (1-12), antenna (A), RGB-D camera (C), speaker (S), and LuMaMi testbed.
  • Figure 3: Examples of the decorations in the environment that can be used in vision-based localization algorithms.
  • Figure 4: Markers and the microphone are shown on the left side of the image and the LuMaMi testbed where the wide antenna configuration is marked with the green circle is on the right.
  • Figure 5: The overview block diagram of the synchronization system and the connections to different sensors.
  • ...and 3 more figures