WiLoc: Massive Measured Dataset of Wi-Fi Channel State Information with Application to Machine-Learning Based Localization
Yuning Zhang, Lei Chu, Omer Gokalp Serbetci, Jorge Gomez-Ponce, Andreas F. Molisch
TL;DR
WiLoc addresses the need for large-scale, labeled CSI data to advance ML-based Wi-Fi localization. It delivers a precision measurement campaign with beacon-based CSI captured by a USRP-enabled RX across >12 million UE locations and >3,000 APs in 16 indoor buildings and 30 outdoor streets, including detailed metadata and ground-truth trajectories. The work documents dataset structure, measurement methodology, validation, and ML baseline results, and demonstrates transfer learning benefits in diverse environments. This resource enables more accurate, robust localization and supports broader channel-aware research beyond localization.
Abstract
Localization is a key component of the wireless ecosystem. Machine learning (ML)-based localization using channel state information (CSI) is one of the most popular methods for achieving high-accuracy localization with low cost. However, to be accurate and robust, ML-based algorithms need to be trained and tested with large amounts of data, covering not only many user equipment (UE)/target locations, but also many different access points (APs) locations to which the UEs connect, in a variety of different environment types. This paper presents a massive-sized CSI dataset, WiLoc (Wi-Fi Localization), and makes it publicly available. WiLoc is obtained by a series of precision measurement campaigns that span three months, and it is massive in all the above-mentioned three dimensions: > 12 million UE locations, > 3,000 APs, covering 16 buildings for indoor localization, and > 30 streets for outdoor use. The paper describes the dataset structure, measurement environments, measurement protocols, and the dataset validations. Comprehensive case studies validate the advantages of large datasets in ML-driven localization strategies for both "standard" and transfer learning. We envision this dataset, which is by far the largest of its kind, to become a standard resource for researchers in the field of ML-based localization.
