Table of Contents
Fetching ...

To Be or Not to Be (in the EU): Measurement of Discrepancies Presented in Cookie Paywalls (LONG)

Andreas Stenwreth, Simon Täng, Victor Morel

TL;DR

This study tackles how cookie paywalls vary with client characteristics by deploying a large-scale crawler across 804 websites and 26 client configurations (geography, OS, and browser). It shows that geographic location has the strongest influence on paywall presence, while browser and OS also modulate paywall behavior and data-processing attributes, as captured through TC Strings. A key contribution is the introduction of the first dataset on double paywalls, found on about 11% of sites, predominantly in European news outlets, with Germany leading in occurrence. The findings have regulatory relevance for GDPR/ePrivacy and the IAB's TCF framework, revealing nuanced cross-country and CMP-related patterns that impact how consent and data sharing are implemented. Overall, the work provides a valuable benchmark for regulators and researchers studying paywall transparency and consent mechanisms, and it points to directions for expanding vantage points and evaluating the broader ecosystem of tracking and subscriptions.

Abstract

Cookie paywalls allow visitors to access the content of a website only after making a choice between paying a fee (paying option) or accepting tracking (cookie option). The practice has been studied in previous research in regard to its prevalence and legal standing, but the effects of the clients' device and geographic location remain unexplored. To address these questions, this study explores the effects of three factors: 1) the clients' browser, 2) the device type (desktop or mobile), and 3) the geographic location on the presence and behavior of cookie paywalls and the handling of users' data. Using an automatic crawler on our dataset composed of 804 websites that present a cookie paywall, we observed that the presence of a cookie paywall was most affected by the geographic location of the user. We further showed that both the behavior of a cookie paywall and the processing of user data are impacted by all three factors, but no patterns of significance could be found. Finally, an additional type of paywall was discovered to be used on approximately 11% of the studied websites, coined the "double paywall", which consists of a cookie paywall complemented by another paywall once tracking is accepted.

To Be or Not to Be (in the EU): Measurement of Discrepancies Presented in Cookie Paywalls (LONG)

TL;DR

This study tackles how cookie paywalls vary with client characteristics by deploying a large-scale crawler across 804 websites and 26 client configurations (geography, OS, and browser). It shows that geographic location has the strongest influence on paywall presence, while browser and OS also modulate paywall behavior and data-processing attributes, as captured through TC Strings. A key contribution is the introduction of the first dataset on double paywalls, found on about 11% of sites, predominantly in European news outlets, with Germany leading in occurrence. The findings have regulatory relevance for GDPR/ePrivacy and the IAB's TCF framework, revealing nuanced cross-country and CMP-related patterns that impact how consent and data sharing are implemented. Overall, the work provides a valuable benchmark for regulators and researchers studying paywall transparency and consent mechanisms, and it points to directions for expanding vantage points and evaluating the broader ecosystem of tracking and subscriptions.

Abstract

Cookie paywalls allow visitors to access the content of a website only after making a choice between paying a fee (paying option) or accepting tracking (cookie option). The practice has been studied in previous research in regard to its prevalence and legal standing, but the effects of the clients' device and geographic location remain unexplored. To address these questions, this study explores the effects of three factors: 1) the clients' browser, 2) the device type (desktop or mobile), and 3) the geographic location on the presence and behavior of cookie paywalls and the handling of users' data. Using an automatic crawler on our dataset composed of 804 websites that present a cookie paywall, we observed that the presence of a cookie paywall was most affected by the geographic location of the user. We further showed that both the behavior of a cookie paywall and the processing of user data are impacted by all three factors, but no patterns of significance could be found. Finally, an additional type of paywall was discovered to be used on approximately 11% of the studied websites, coined the "double paywall", which consists of a cookie paywall complemented by another paywall once tracking is accepted.

Paper Structure

This paper contains 26 sections, 7 figures, 12 tables.

Figures (7)

  • Figure 1: A cookie paywall with a cookie option and a pay option on https://www.buzzfeed.de/.
  • Figure 2: The program flow for detecting cookie paywalls in different stages.
  • Figure 3: A Premium article for "Stimme+" subscribers under a free article on https://www.stimme.de.
  • Figure 4: Number of cookie paywalls detected for each combination of the studied factors.
  • Figure 5: Distribution of CMPs used by websites that present a cookie paywall when accessed from the Swedish vantage point but not the USA. Only CMPs used on more than 10 websites are considered.
  • ...and 2 more figures