Table of Contents
Fetching ...

Why am I seeing this: Democratizing End User Auditing for Online Content Recommendations

Chaoran Chen, Leyang Li, Luke Cao, Yanfang Ye, Tianshi Li, Yaxing Yao, Toby Jia-jun Li

TL;DR

Personalized recommendations rely on private user data, yet users struggle to verify how attributes influence the outcomes. The authors introduce a Privacy Auditing Sandbox that uses LLM-generated personas and controlled attribute variation to test causal links between user characteristics and online content, demonstrated in a targeted-ad case study. Technical evaluations show strong persona quality, high ad-identification accuracy, and stable ad-rating scores, while the user study confirms usability and perceived empowerment in privacy auditing. The approach advances end-user agency and privacy literacy and offers a pathway to broader applicability in auditing algorithmic accountability across privacy-sensitive domains.

Abstract

Personalized recommendation systems tailor content based on user attributes, which are either provided or inferred from private data. Research suggests that users often hypothesize about reasons behind contents they encounter (e.g., "I see this jewelry ad because I am a woman"), but they lack the means to confirm these hypotheses due to the opaqueness of these systems. This hinders informed decision-making about privacy and system use and contributes to the lack of algorithmic accountability. To address these challenges, we introduce a new interactive sandbox approach. This approach creates sets of synthetic user personas and corresponding personal data that embody realistic variations in personal attributes, allowing users to test their hypotheses by observing how a website's algorithms respond to these personas. We tested the sandbox in the context of targeted advertisement. Our user study demonstrates its usability, usefulness, and effectiveness in empowering end-user auditing in a case study of targeting ads.

Why am I seeing this: Democratizing End User Auditing for Online Content Recommendations

TL;DR

Personalized recommendations rely on private user data, yet users struggle to verify how attributes influence the outcomes. The authors introduce a Privacy Auditing Sandbox that uses LLM-generated personas and controlled attribute variation to test causal links between user characteristics and online content, demonstrated in a targeted-ad case study. Technical evaluations show strong persona quality, high ad-identification accuracy, and stable ad-rating scores, while the user study confirms usability and perceived empowerment in privacy auditing. The approach advances end-user agency and privacy literacy and offers a pathway to broader applicability in auditing algorithmic accountability across privacy-sensitive domains.

Abstract

Personalized recommendation systems tailor content based on user attributes, which are either provided or inferred from private data. Research suggests that users often hypothesize about reasons behind contents they encounter (e.g., "I see this jewelry ad because I am a woman"), but they lack the means to confirm these hypotheses due to the opaqueness of these systems. This hinders informed decision-making about privacy and system use and contributes to the lack of algorithmic accountability. To address these challenges, we introduce a new interactive sandbox approach. This approach creates sets of synthetic user personas and corresponding personal data that embody realistic variations in personal attributes, allowing users to test their hypotheses by observing how a website's algorithms respond to these personas. We tested the sandbox in the context of targeted advertisement. Our user study demonstrates its usability, usefulness, and effectiveness in empowering end-user auditing in a case study of targeting ads.
Paper Structure (45 sections, 9 figures, 5 tables)

This paper contains 45 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An overview of our privacy auditing sandbox approach
  • Figure 2: System Overview: (a) An input field for generating the base persona profile. (b) A control panel to select personal attributes for persona variants, choose a website to audit, and set visit frequencies. (c) A display showing selected personal attribute values, along with the names and descriptions of color-coded persona variants. As users scroll, persona variants shift to the left for continued reference. (d) Visualization of ad distribution based on rated scores, with a scatter plot where each point represents an ad. The ads in the image were obtained by visiting https://www.thepioneerwoman.com/ twice using a persona set consisting of 3 variants. The X-axis shows the Ad-Attribute Alignment score, and the Y-axis shows score probability density. Points are color-coded by persona variants. Users can hover over a point to view the corresponding persona, ad image, description, and rating in the ad list below.
  • Figure 3: Persona variation. (a) Providing guidance for base Persona’s profile generation. (b) Selecting personal attribute to generate persona variants.
  • Figure 4: Hypothesis consolidation (a) Replacing the user's real data with each persona variant's data, including Google account, location, IP address, user agent, and browsing history. (b) Detecting online ads displayed on the selected website. (c) Translating ad images into textual descriptions and rating them on a Ad-Attribute Alignment Score to evaluate their alignment with personal attributes. (d) Visualizing the ad distribution based on the rated scores. Each point on the scatter plot represents an ad, with the X-axis indicating the Ad-Attribute Alignment Score and the Y-axis representing the probability density of the scores based on a normal distribution curve. Points are color-coded by one of the three persona variants.
  • Figure 5: An overview of our technical evaluation. (a) Conducted an expert review to evaluate the quality of the generated personas. (b) Evaluated the accuracy of ad identification. (c) Assessed the stability of LLM-generated ad scores based on privacy attributes. (d) Investigated the impact of persona substitution on the ad content.
  • ...and 4 more figures