LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
Ilayda Yaman, Guoda Tian, Erik Tegler, Jens Gulin, Nikhil Challa, Fredrik Tufvesson, Ove Edfors, Kalle Astrom, Steffen Malkowsky, Liang Liu
TL;DR
LuViRA introduces a synchronized, multi-sensor indoor localization dataset combining vision, radio, and audio modalities with ground-truth trajectories. The paper benchmarks state-of-the-art algorithms—ORB-SLAM3 for vision, ICC for radio, and SFS2 for audio—across grid and random trajectories to compare accuracy, reliability, calibration needs, and complexity. Results show vision delivers high accuracy in many cases, audio can outperform in moving-object scenarios, and radio is robust to low SNR yet struggles with unpredictable trajectories, motivating sensor fusion. Overall, LuViRA provides a practical roadmap for developing robust, multi-sensory localization systems in realistic indoor environments.
Abstract
We present a unique comparative analysis, and evaluation of vision, radio, and audio based localization algorithms. We create the first baseline for the aforementioned sensors using the recently published Lund University Vision, Radio, and Audio (LuViRA) dataset, where all the sensors are synchronized and measured in the same environment. Some of the challenges of using each specific sensor for indoor localization tasks are highlighted. Each sensor is paired with a current state-of-the-art localization algorithm and evaluated for different aspects: localization accuracy, reliability and sensitivity to environment changes, calibration requirements, and potential system complexity. Specifically, the evaluation covers the ORB-SLAM3 algorithm for vision-based localization with an RGB-D camera, a machine-learning algorithm for radio-based localization with massive MIMO technology, and the SFS2 algorithm for audio-based localization with distributed microphones. The results can serve as a guideline and basis for further development of robust and high-precision multi-sensory localization systems, e.g., through sensor fusion, context, and environment-aware adaptation.
