Table of Contents
Fetching ...

Longitudinal Analysis of Privacy Labels in the Apple App Store

David G. Balash, Mir Masood Ali, Monica Kodwani, Xiaoyuan Wu, Chris Kanich, Adam J. Aviv

TL;DR

This study provides a large-scale, longitudinal analysis of Apple’s App Store privacy labels, tracking adoption, changes, and associations with app metadata across 66 weeks for over 1.6 million apps. It shows a steady rise in label adoption, driven largely by new apps, but reveals that labels are self-reported and often align with more data collection when updated, raising concerns about accuracy and utility for users. The work highlights systematic issues such as label misalignment, under-reporting of location data, and the potential of privacy labels to function as a set-once, convenience-driven mechanism rather than a reliable user-privacy guide. The findings offer baselines for future measurement and underscore the need for regulatory or design interventions to improve label accuracy, update communication, and user empowerment.

Abstract

In December of 2020, Apple started to require app developers to self-report privacy label annotations on their apps indicating what data is collected and how it is used.To understand the adoption and shifts in privacy labels in the App Store, we collected nearly weekly snapshots of over 1.6 million apps for over a year (July 15, 2021 -- October 25, 2022) to understand the dynamics of privacy label ecosystem. Nearly two years after privacy labels launched, only 70.1% of apps have privacy labels, but we observed an increase of 28% during the measurement period. Privacy label adoption rates are mostly driven by new apps rather than older apps coming into compliance. Of apps with labels, 18.1% collect data used to track users, 38.1% collect data that is linked to a user identity, and 42.0% collect data that is not linked. A surprisingly large share (41.8%) of apps with labels indicate that they do not collect any data, and while we do not perform direct analysis of the apps to verify this claim, we observe that it is likely that many of these apps are choosing a Does Not Collect label due to being forced to select a label, rather than this being the true behavior of the app. Moreover, for apps that have assigned labels during the measurement period nearly all do not change their labels, and when they do, the new labels indicate more data collection than less. This suggests that privacy labels may be a ``set once'' mechanism for developers that may not actually provide users with the clarity needed to make informed privacy decisions.

Longitudinal Analysis of Privacy Labels in the Apple App Store

TL;DR

This study provides a large-scale, longitudinal analysis of Apple’s App Store privacy labels, tracking adoption, changes, and associations with app metadata across 66 weeks for over 1.6 million apps. It shows a steady rise in label adoption, driven largely by new apps, but reveals that labels are self-reported and often align with more data collection when updated, raising concerns about accuracy and utility for users. The work highlights systematic issues such as label misalignment, under-reporting of location data, and the potential of privacy labels to function as a set-once, convenience-driven mechanism rather than a reliable user-privacy guide. The findings offer baselines for future measurement and underscore the need for regulatory or design interventions to improve label accuracy, update communication, and user empowerment.

Abstract

In December of 2020, Apple started to require app developers to self-report privacy label annotations on their apps indicating what data is collected and how it is used.To understand the adoption and shifts in privacy labels in the App Store, we collected nearly weekly snapshots of over 1.6 million apps for over a year (July 15, 2021 -- October 25, 2022) to understand the dynamics of privacy label ecosystem. Nearly two years after privacy labels launched, only 70.1% of apps have privacy labels, but we observed an increase of 28% during the measurement period. Privacy label adoption rates are mostly driven by new apps rather than older apps coming into compliance. Of apps with labels, 18.1% collect data used to track users, 38.1% collect data that is linked to a user identity, and 42.0% collect data that is not linked. A surprisingly large share (41.8%) of apps with labels indicate that they do not collect any data, and while we do not perform direct analysis of the apps to verify this claim, we observe that it is likely that many of these apps are choosing a Does Not Collect label due to being forced to select a label, rather than this being the true behavior of the app. Moreover, for apps that have assigned labels during the measurement period nearly all do not change their labels, and when they do, the new labels indicate more data collection than less. This suggests that privacy labels may be a ``set once'' mechanism for developers that may not actually provide users with the clarity needed to make informed privacy decisions.
Paper Structure (27 sections, 16 figures, 1 table)

This paper contains 27 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: (left) An illustrative example of a privacy label from the Apple App Store, and (right) an illustrative example of the privacy label details from the Apple App Store. The details display the Purposes for the data collection and the detailed information about the Data Types collected.
  • Figure 2: A privacy label consists of four hierarchical levels of information. The Privacy Type broadly identifies how the data collected will be used. The Purpose provides more detail on how each data type is used. The Data Category is a categorization the Data Type which is a detailed description of the type of data collected.
  • Figure 3: A longitudinal view over the year-long collection period of the total number of apps and the total number of apps with privacy labels (compliant apps). For comparison, we also display the four Privacy Types over the same period. Each data point represents a snapshot of the Apple App Store on that date.
  • Figure 4: A Venn diagram of the number of apps in each of the four Privacy Types. Data Not Collected is mutually exclusive to the other three Privacy Types.
  • Figure 5: The ratios of the six Purposes for the Data Linked to You and Data Not Linked to YouPrivacy Types. The denominator is the number of apps in the specific Privacy Type.
  • ...and 11 more figures