Table of Contents
Fetching ...

Identifying Privacy Personas

Olena Hrynenko, Andrea Cavallaro

TL;DR

This work addresses the need for granular privacy personas by combining qualitative coding with quantitative trait extraction from an interactive privacy-education questionnaire. It introduces a mixed-data dissimilarity measure and a two-step pruning pipeline using Boschloo’s tests to produce eight statistically distinct personas, validated via sensitivity and saturation analyses. By situating these personas within and against Westin, Biselli, Dupree, and Schomakers frameworks, the study demonstrates richer segmentation than prior work and highlights implications for tailored privacy communication and PET adoption. The approach supports finer-grained privacy support, targeted recruitment for studies, and informs the design of configurable privacy settings and PETs, with future plans for scalability and cross-cultural validation.

Abstract

Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection. Modelling these differences is essential for appropriately choosing personalised communication about privacy (e.g. to increase literacy) and for defining suitable choices for privacy enhancing technologies (PETs). While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes such as perceived or desired level of control, and motivation to use PET. To address this lack of granularity and comprehensiveness in describing personas, we propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire. We design an analysis pipeline that uses divisive hierarchical clustering and Boschloo's statistical test of homogeneity of proportions to ensure that the elicited clusters differ from each other based on a statistical measure. Additionally, we propose a new measure for calculating distances between questionnaire responses, that accounts for the type of the question (closed- vs open-ended) used to derive traits. We show that the proposed privacy personas statistically differ from each other. We statistically validate the proposed personas and also compare them with personas in the literature, showing that they provide a more granular and comprehensive understanding of user segments, which will allow to better assist users with their privacy needs.

Identifying Privacy Personas

TL;DR

This work addresses the need for granular privacy personas by combining qualitative coding with quantitative trait extraction from an interactive privacy-education questionnaire. It introduces a mixed-data dissimilarity measure and a two-step pruning pipeline using Boschloo’s tests to produce eight statistically distinct personas, validated via sensitivity and saturation analyses. By situating these personas within and against Westin, Biselli, Dupree, and Schomakers frameworks, the study demonstrates richer segmentation than prior work and highlights implications for tailored privacy communication and PET adoption. The approach supports finer-grained privacy support, targeted recruitment for studies, and informs the design of configurable privacy settings and PETs, with future plans for scalability and cross-cultural validation.

Abstract

Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection. Modelling these differences is essential for appropriately choosing personalised communication about privacy (e.g. to increase literacy) and for defining suitable choices for privacy enhancing technologies (PETs). While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes such as perceived or desired level of control, and motivation to use PET. To address this lack of granularity and comprehensiveness in describing personas, we propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire. We design an analysis pipeline that uses divisive hierarchical clustering and Boschloo's statistical test of homogeneity of proportions to ensure that the elicited clusters differ from each other based on a statistical measure. Additionally, we propose a new measure for calculating distances between questionnaire responses, that accounts for the type of the question (closed- vs open-ended) used to derive traits. We show that the proposed privacy personas statistically differ from each other. We statistically validate the proposed personas and also compare them with personas in the literature, showing that they provide a more granular and comprehensive understanding of user segments, which will allow to better assist users with their privacy needs.

Paper Structure

This paper contains 31 sections, 4 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Process of extracting codes and traits formation. We follow different processes for open and closed-ended questions. Coding and traits extraction for open-ended questions consists of code extraction (done by two coders independently); code clean-up, traits formation (done by one coder); traits and affinity diagram discussion (done by two coders). For closed-ended questions the Likert-scale answers were the codes themselves (however coder paraphrased them to full sentences), traits and affinity diagram formation. An affinity diagram helps to organise data which initially seems unstructured lucero_using_2015, allowing to find categories of the traits dupree_privacy_2016. At the end of the process, there is a concatenation of the affinity diagram and the extracted traits.
  • Figure 2: Our two-step method for discovering privacy personas. In step one , the dendrogram is pruned if a parent cluster is split into two sub-clusters that are statistically similar to each other, meaning there are no traits that make these clusters different based on Boschloo's test boschloo_raised_1970. In step two , the dendrogram is further pruned if there exists at least one leaf that is statistically similar to other leaves. The final personas are in green .
  • Figure 3: The discriminative features of our identified privacy personas. Text in bold defines an attribute on which the personas differ from each other, the values next to the arrows are the values of a corresponding attribute.
  • Figure 4: Sensitivity analysis of the dendrogram obtained on the generation set $\mathscr{G}$. We randomly remove $r$ participants from the generation set $\mathscr{G}$, form a new dendrogram, and compute the Fowlkes-Mallows (FM) Index fowlkes_method_1983 between the newly obtained dendrogram and a dendrogram obtained on the initial generation set $\mathscr{G}$. We repeat the sampling procedure 500 times for each value of $r$, and report the mean value. We notice that the highest drop in performance takes place at a dendrogram depth equal to 10 for all $r$. For more detailed plots of the sensitivity analysis, see Appendix \ref{['app:sensitivity_analysis']}.
  • Figure 5: Mapping of our personas into Westin's ponnurangam__kumaraguru_privacy_2005 (left), Biselli's biselli_challenges_2022 (centre), and Dupree's dupree_privacy_2016 (right). Our personas are mapped into Westin's by considering the privacy protection importance attribute: high, moderate and low privacy protection importance is mapped into Westin's Fundamentalist, Pragmatist and Unconcerned personas accordingly. Our personas are mapped into Biselli's by considering knowledge and behaviour attributes: high, moderate and low knowledge and privacy-protective behaviour are mapped into Biselli's Fundamentalist, Pragmatist and Unconcerned personas accordingly). Our personas are mapped into Dupree's by considering primarily the knowledge attribute.
  • ...and 7 more figures