Table of Contents
Fetching ...

DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents

Olivia Figueira, Rahmadi Trimananda, Athina Markopoulou, Scott Jordan

TL;DR

DiffAudit is presented, a platform-agnostic privacy auditing methodology for general audience services that reveals problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.

Abstract

Children's and adolescents' online data privacy are regulated by laws such as the Children's Online Privacy Protection Act (COPPA) and the California Consumer Privacy Act (CCPA). Online services that are directed towards general audiences (i.e., including children, adolescents, and adults) must comply with these laws. In this paper, first, we present DiffAudit, a platform-agnostic privacy auditing methodology for general audience services. DiffAudit performs differential analysis of network traffic data flows to compare data processing practices (i) between child, adolescent, and adult users and (ii) before and after consent is given and user age is disclosed. We also present a data type classification method that utilizes GPT-4 and our data type ontology based on COPPA and CCPA, allowing us to identify considerably more data types than prior work. Second, we apply DiffAudit to a set of popular general audience mobile and web services and observe a rich set of behaviors extracted from over 440K outgoing requests, containing 3,968 unique data types we extracted and classified. We reveal problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.

DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents

TL;DR

DiffAudit is presented, a platform-agnostic privacy auditing methodology for general audience services that reveals problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.

Abstract

Children's and adolescents' online data privacy are regulated by laws such as the Children's Online Privacy Protection Act (COPPA) and the California Consumer Privacy Act (CCPA). Online services that are directed towards general audiences (i.e., including children, adolescents, and adults) must comply with these laws. In this paper, first, we present DiffAudit, a platform-agnostic privacy auditing methodology for general audience services. DiffAudit performs differential analysis of network traffic data flows to compare data processing practices (i) between child, adolescent, and adult users and (ii) before and after consent is given and user age is disclosed. We also present a data type classification method that utilizes GPT-4 and our data type ontology based on COPPA and CCPA, allowing us to identify considerably more data types than prior work. Second, we apply DiffAudit to a set of popular general audience mobile and web services and observe a rich set of behaviors extracted from over 440K outgoing requests, containing 3,968 unique data types we extracted and classified. We reveal problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.
Paper Structure (31 sections, 5 figures, 5 tables)

This paper contains 31 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: DiffAudit Framework Overview. First, we select the general audience services for auditing. Next, we perform network traffic analysis: (1) Collect network traffic in three different ways: account creation trace (we collect traffic while creating a user account), logged-in trace (we collect traffic only while logged in to each account on the website/app and throughout using the service), and logged-out trace (we collect traffic only while logged out of any account on the website/app). (2) For each trace, we convert the raw HAR (for website) or PCAP (for mobile) data to JSON and extract the payload data and destinations of the packets. (3) We construct the data flows by processing packet destinations (i.e., 1st or 3rd party and entity analysis) and perform data type classification using GPT-4 classifier and COPPA/CCPA data type ontology. Next, we perform the differential audit: (4) compare the data flows by age group and (5) audit the flows in context of each age group, applicable law, and information from each service's privacy policy.
  • Figure 2: Data Type Classification System Diagram. On the left we present an excerpt of the data type ontology. The data type category labels (third level) are used as labels for the classifier. We input the GPT-4 model prompt, data type ontology labels, and raw data types extracted from the network traffic to the GPT-4 Chat Completions API, which outputs classification results.
  • Figure 3: Counts of Third Parties Sent Linkable Data Types Per Service and Trace Category. Counts include third party domains, both ATS and non-ATS, that were sent linkable data types from each service per trace category (i.e., child, adolescent, adult, and logged out).
  • Figure 4: Sizes of Largest Sets of Linkable Data Types. A set of linkable data types is defined as all the data types that were shared with third party domains, including both ATS and non-ATS. The graph shows the size of the largest set of linkable data types shared by each service per trace category (i.e., child, adolescent, adult, and logged out).
  • Figure 5: Most Frequent Third Party ATS Domains Sent Linkable Data Types. Alluvial diagram visualizes the top-10 most contacted third party ATS, shown based on their organizations, that were sent linkable data types by each service per trace category (i.e., child, adolescent, adult, and logged out).