Table of Contents
Fetching ...

Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies

Mir Masood Ali, David G. Balash, Monica Kodwani, Chris Kanich, Adam J. Aviv

TL;DR

The paper investigates whether Apple’s privacy labels reliably reflect app privacy practices by comparing label outputs to policy-derived content across 474,669 iOS apps. It develops a hierarchical, transformer-based framework (PrivBERT) to extract policy features, maps them to Apple’s label taxonomy, and evaluates discrepancies at scale, revealing substantial under-reporting and misalignment—especially for data linked to users and tracking—across multiple dimensions (types, purposes, categories, pricing, and templates). The authors provide case studies and network-traffic evidence illustrating why policies often diverge from labels, and they propose a first-level, policy-driven check to help developers improve label accuracy, along with public release of code and data. The work highlights the need for better alignment between label taxonomies and policy content, improved NLP tools that consider cross-segment context, and regulatory or platform-supported measures to enhance transparency and user protection in app ecosystems.

Abstract

Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps. While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point. In this paper, we fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store, comparing the output to the privacy labels. We identify discrepancies between the policies and the labels, particularly as they relate to data collected linked to users. We find that 228K apps' privacy policies may indicate data collection linked to users than what is reported in the privacy labels. More alarming, a large number (97%) of the apps with a Data Not Collected privacy label have a privacy policy indicating otherwise. We provide insights into potential sources for discrepancies, including the use of templates and confusion around Apple's definitions and requirements. These results suggest that significant work is still needed to help developers more accurately label their apps. Our system can be incorporated as a first-order check to inform developers when privacy labels are possibly misapplied.

Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies

TL;DR

The paper investigates whether Apple’s privacy labels reliably reflect app privacy practices by comparing label outputs to policy-derived content across 474,669 iOS apps. It develops a hierarchical, transformer-based framework (PrivBERT) to extract policy features, maps them to Apple’s label taxonomy, and evaluates discrepancies at scale, revealing substantial under-reporting and misalignment—especially for data linked to users and tracking—across multiple dimensions (types, purposes, categories, pricing, and templates). The authors provide case studies and network-traffic evidence illustrating why policies often diverge from labels, and they propose a first-level, policy-driven check to help developers improve label accuracy, along with public release of code and data. The work highlights the need for better alignment between label taxonomies and policy content, improved NLP tools that consider cross-segment context, and regulatory or platform-supported measures to enhance transparency and user protection in app ecosystems.

Abstract

Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps. While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point. In this paper, we fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store, comparing the output to the privacy labels. We identify discrepancies between the policies and the labels, particularly as they relate to data collected linked to users. We find that 228K apps' privacy policies may indicate data collection linked to users than what is reported in the privacy labels. More alarming, a large number (97%) of the apps with a Data Not Collected privacy label have a privacy policy indicating otherwise. We provide insights into potential sources for discrepancies, including the use of templates and confusion around Apple's definitions and requirements. These results suggest that significant work is still needed to help developers more accurately label their apps. Our system can be incorporated as a first-order check to inform developers when privacy labels are possibly misapplied.
Paper Structure (57 sections, 11 figures, 6 tables)

This paper contains 57 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Anatomy of a Privacy Label.
  • Figure 2: An overview of the measurement workflow.
  • Figure 3: The hierarchical structure of the Polisis classifiers.
  • Figure 4: An overview of apps declaring data collection with corresponding Privacy Types within their privacy policies (top) and on the App Store via privacy labels (bottom). The denominator is the total apps that we analyzed, i.e., 474,669 apps. Please note that the privacy types, except for Data Not Collected, are not mutually exclusive.
  • Figure 5: The ratios of the six purposes for the Data Linked to You and Data Not Linked to You privacy types. The denominator is the number of apps with the designated privacy type either in their privacy label or their privacy policy, i.e., 419,504 apps with a Data Linked to You label and 294,391 with a Data Not Linked to You label. It is helpful to note here that privacy types shown here are not mutually exclusive. Two other Privacy Types are not shown here; the Data Used to Track You privacy type refers to collection for the purpose of tracking, while the Data Not Collected refers to the absence of any data collection.
  • ...and 6 more figures