Table of Contents
Fetching ...

Your Signal, Their Data: An Empirical Privacy Analysis of Wireless-scanning SDKs in Android

Aniketh Girish, Joel Reardon, Juan Tapiador, Srdjan Matic, Narseo Vallina-Rodriguez

TL;DR

This paper addresses the privacy risks posed by wireless-scanning SDKs embedded in Android apps by performing the first large-scale, hybrid static-dynamic analysis of 52 beacon-enabled SDKs across 9,976 apps. Using a multifaceted pipeline that includes dataset curation, beacon-SDK detection, control-flow analysis, and runtime signal injection on instrumented devices, the authors reveal extensive data collection (including GPS, WiFi/BLE scans, and various IDs) and frequent cross-SDK data sharing and ID bridging. Key findings show that 86% of beacon-enabled apps collect at least one sensitive data type, and significant instances of persistent-resettable ID linking occur, often without transparent user consent or compliance with platform policies. The work argues for stronger SDK sandboxing, stricter enforcement of platform rules, and enhanced transparency mechanisms to curb covert tracking and enable user control, with broader implications for regulation and privacy safeguards in mobile ecosystems.

Abstract

Mobile apps frequently use Bluetooth Low Energy (BLE) and WiFi scanning permissions to discover nearby devices like peripherals and connect to WiFi Access Points (APs). However, wireless interfaces also serve as a covert proxy for geolocation data, enabling continuous user tracking and profiling. This includes technologies like BLE beacons, which are BLE devices broadcasting unique identifiers to determine devices' indoor physical locations; such beacons are easily found in shopping centres. Despite the widespread use of wireless scanning APIs and their potential for privacy abuse, the interplay between commercial mobile SDKs with wireless sensing and beaconing technologies remains largely unexplored. In this work, we conduct the first systematic analysis of 52 wireless-scanning SDKs, revealing their data collection practices and privacy risks. We develop a comprehensive analysis pipeline that enables us to detect beacon scanning capabilities, inject wireless events to trigger app behaviors, and monitor runtime execution on instrumented devices. Our findings show that 86% of apps integrating these SDKs collect at least one sensitive data type, including device and user identifiers such as AAID, email, along with GPS coordinates, WiFi and Bluetooth scan results. We uncover widespread SDK-to-SDK data sharing and evidence of ID bridging, where persistent and resettable identifiers are shared and synchronized within SDKs embedded in applications to potentially construct detailed mobility profiles, compromising user anonymity and enabling long-term tracking. We provide evidence of key actors engaging in these practices and conclude by proposing mitigation strategies such as stronger SDK sandboxing, stricter enforcement of platform policies, and improved transparency mechanisms to limit unauthorized tracking.

Your Signal, Their Data: An Empirical Privacy Analysis of Wireless-scanning SDKs in Android

TL;DR

This paper addresses the privacy risks posed by wireless-scanning SDKs embedded in Android apps by performing the first large-scale, hybrid static-dynamic analysis of 52 beacon-enabled SDKs across 9,976 apps. Using a multifaceted pipeline that includes dataset curation, beacon-SDK detection, control-flow analysis, and runtime signal injection on instrumented devices, the authors reveal extensive data collection (including GPS, WiFi/BLE scans, and various IDs) and frequent cross-SDK data sharing and ID bridging. Key findings show that 86% of beacon-enabled apps collect at least one sensitive data type, and significant instances of persistent-resettable ID linking occur, often without transparent user consent or compliance with platform policies. The work argues for stronger SDK sandboxing, stricter enforcement of platform rules, and enhanced transparency mechanisms to curb covert tracking and enable user control, with broader implications for regulation and privacy safeguards in mobile ecosystems.

Abstract

Mobile apps frequently use Bluetooth Low Energy (BLE) and WiFi scanning permissions to discover nearby devices like peripherals and connect to WiFi Access Points (APs). However, wireless interfaces also serve as a covert proxy for geolocation data, enabling continuous user tracking and profiling. This includes technologies like BLE beacons, which are BLE devices broadcasting unique identifiers to determine devices' indoor physical locations; such beacons are easily found in shopping centres. Despite the widespread use of wireless scanning APIs and their potential for privacy abuse, the interplay between commercial mobile SDKs with wireless sensing and beaconing technologies remains largely unexplored. In this work, we conduct the first systematic analysis of 52 wireless-scanning SDKs, revealing their data collection practices and privacy risks. We develop a comprehensive analysis pipeline that enables us to detect beacon scanning capabilities, inject wireless events to trigger app behaviors, and monitor runtime execution on instrumented devices. Our findings show that 86% of apps integrating these SDKs collect at least one sensitive data type, including device and user identifiers such as AAID, email, along with GPS coordinates, WiFi and Bluetooth scan results. We uncover widespread SDK-to-SDK data sharing and evidence of ID bridging, where persistent and resettable identifiers are shared and synchronized within SDKs embedded in applications to potentially construct detailed mobility profiles, compromising user anonymity and enabling long-term tracking. We provide evidence of key actors engaging in these practices and conclude by proposing mitigation strategies such as stronger SDK sandboxing, stricter enforcement of platform policies, and improved transparency mechanisms to limit unauthorized tracking.

Paper Structure

This paper contains 23 sections, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Beacon data ecosystem.
  • Figure 2: Methodology overview. Processes 2.a and 2.b. run in parallel.
  • Figure 3: Cross-library interactions between beacon and non-beacon SDKs (square nodes). Dotted lines represent interactions between beacon SDKs, while solid lines show interactions with non-beacon SDKs. Arrows indicate directionality, from caller to callee SDK, with colors denoting the specific APIs accessed.
  • Figure 4: iBeacon advertisements and geofence data exfiltrated to Radar.io, including Android ID, beacon details (UUID, major, minor, RSSI), and geofence metadata (coordinates and type).
  • Figure 5: The UpSet plot illustrates how SDKs collect different combinations of ID categories. The top bars represent the percentage of SDKs collecting specific combinations (indicated by connected dots below), while the left bars show the total percentage of SDKs collecting each category, regardless of other data types collected.
  • ...and 3 more figures