Multi-Surrogate-Teacher Assistance for Representation Alignment in Fingerprint-based Indoor Localization
Son Minh Nguyen, Linh Duy Tran, Duc Viet Le, Paul J. M Havinga
TL;DR
This work tackles the challenge of transferring learned representations across heterogeneous RSS fingerprint datasets for indoor localization. It introduces a plug-and-play framework with two phases: Expert Training, which builds surrogate teachers for source datasets, and Expert Distilling, which aligns target representations with these surrogates using three constraints ($J_{Sim}$, $J_{MI}$, $J_{FI}$). The approach achieves significant improvements over state-of-the-art specialized models across three benchmark datasets and proves robust to source-relevance variations, all while preserving architectural integrity and data privacy. Practically, this framework enables broad, environment-agnostic localization performance without requiring access to source data or substantial model changes, facilitating deployment in privacy-sensitive or multi-tenant settings.
Abstract
Despite remarkable progress in knowledge transfer across visual and textual domains, extending these achievements to indoor localization, particularly for learning transferable representations among Received Signal Strength (RSS) fingerprint datasets, remains a challenge. This is due to inherent discrepancies among these RSS datasets, largely including variations in building structure, the input number and disposition of WiFi anchors. Accordingly, specialized networks, which were deprived of the ability to discern transferable representations, readily incorporate environment-sensitive clues into the learning process, hence limiting their potential when applied to specific RSS datasets. In this work, we propose a plug-and-play (PnP) framework of knowledge transfer, facilitating the exploitation of transferable representations for specialized networks directly on target RSS datasets through two main phases. Initially, we design an Expert Training phase, which features multiple surrogate generative teachers, all serving as a global adapter that homogenizes the input disparities among independent source RSS datasets while preserving their unique characteristics. In a subsequent Expert Distilling phase, we continue introducing a triplet of underlying constraints that requires minimizing the differences in essential knowledge between the specialized network and surrogate teachers through refining its representation learning on the target dataset. This process implicitly fosters a representational alignment in such a way that is less sensitive to specific environmental dynamics. Extensive experiments conducted on three benchmark WiFi RSS fingerprint datasets underscore the effectiveness of the framework that significantly exerts the full potential of specialized networks in localization.
