Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies
Mir Masood Ali, David G. Balash, Monica Kodwani, Chris Kanich, Adam J. Aviv
TL;DR
The paper investigates whether Apple’s privacy labels reliably reflect app privacy practices by comparing label outputs to policy-derived content across 474,669 iOS apps. It develops a hierarchical, transformer-based framework (PrivBERT) to extract policy features, maps them to Apple’s label taxonomy, and evaluates discrepancies at scale, revealing substantial under-reporting and misalignment—especially for data linked to users and tracking—across multiple dimensions (types, purposes, categories, pricing, and templates). The authors provide case studies and network-traffic evidence illustrating why policies often diverge from labels, and they propose a first-level, policy-driven check to help developers improve label accuracy, along with public release of code and data. The work highlights the need for better alignment between label taxonomies and policy content, improved NLP tools that consider cross-segment context, and regulatory or platform-supported measures to enhance transparency and user protection in app ecosystems.
Abstract
Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps. While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point. In this paper, we fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store, comparing the output to the privacy labels. We identify discrepancies between the policies and the labels, particularly as they relate to data collected linked to users. We find that 228K apps' privacy policies may indicate data collection linked to users than what is reported in the privacy labels. More alarming, a large number (97%) of the apps with a Data Not Collected privacy label have a privacy policy indicating otherwise. We provide insights into potential sources for discrepancies, including the use of templates and confusion around Apple's definitions and requirements. These results suggest that significant work is still needed to help developers more accurately label their apps. Our system can be incorporated as a first-order check to inform developers when privacy labels are possibly misapplied.
