BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

Hymalai Bello; Sungho Suh; Bo Zhou; Paul Lukowicz

BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz

TL;DR

This paper tackles privacy and scalability concerns in indoor worker localization by replacing camera-based tracking with a BLE-RSSI–driven approach enhanced by ultrasound coordinates. The authors introduce BeSound, a teacher–student knowledge distillation framework where a multimodal teacher (BLE-RSSI plus ultrasound) guides a lightweight BLE-only student, enabling inference using only BLE inputs. In a smart factory test bed with 12 participants, distilled models substantially outperform BLE baselines, achieving up to an 11.79% improvement in F1-score while maintaining a compact, energy-efficient footprint suitable for smartphone-enabled deployment. The work demonstrates a practical, privacy-preserving path to accurate, scalable RTLS in industrial settings, with clear avenues for reducing sensor dependence and validating performance in crowded or varied environments.

Abstract

Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).

BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

TL;DR

Abstract

Paper Structure (11 sections, 3 equations, 4 figures, 4 tables)

This paper contains 11 sections, 3 equations, 4 figures, 4 tables.

Introduction
Background and Related Work
Bluetooth Low Energy-Based Indoor Localization Methods
Ultrasound-Based Indoor Localization Methods
Multimodal Knowledge Distillation
Method
Hardware and Experimental Setting
Experimental Procedure
BeSound Knowledge Distillation
Result and Discussion
Conclusion

Figures (4)

Figure 1: Experimental Setting Map. Left Volunteer Wearing the BLE-RSSI Emitter (Upper-Arm) and the Ultrasound Hedge (Lower back BLACK BOX) Together with the Spatial Position of The RSSI-Receivers in Green Color and the Position of the Ultrasound Beacons in Red Color. Right Top View Showing the Distances of the RSSI-Receivers and Ultrasound Beacon in the Smart Factory Test Bed.
Figure 2: BeSound Knowledge Distillation Approach. Left The Multimodal and Multipositional Teacher Consists of Two Concatenate Networks; One For Cross-Channel Interaction Feature Extraction and One Network for Causality Extraction (LSTM-Based). Right The Student Consists of a Multipositional Feature Extractor and a Classifier. The Distillation is Applied at the Logit Level to Improve the Position Estimation of the Student.
Figure 3: Multimodal and Multipositional Teacher Average Results with 5Fold-Cross Validation with Leave-Session-Out Evaluation Scheme. Left Confusion Matrix Result for a Window Size of Two Seconds with F1-Score of 79.85%. Right Confusion Matrix Result for a Window Size of Ten Seconds with F1-Score of 84.41%.
Figure 4: Target Models and Distilled Students Models Average Results with 5Fold-Cross Validation with Leave-One-Session Out Evaluation Scheme.

BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

TL;DR

Abstract

BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)