Table of Contents
Fetching ...

Recipient Profiling: Predicting Characteristics from Messages

Martin Borquez, Mikaela Keller, Michael Perrot, Damien Sileo

TL;DR

This work formalizes Recipient Profiling, a task that predicts a recipient's sensitive attributes from messages, highlighting a previously overlooked privacy risk in text data. Using three transformer-based encoders (BERT, MPNet, DeBERTa) across three dialogue-focused datasets (SWDA, MDC, TIC), the authors demonstrate better-than-chance gender prediction for recipients and show partial cross-dataset transferability. They also analyze gender-driven accuracy differences and model agreement, revealing complementary patterns across models. The study argues for further work on explainability and privacy mitigation, and suggests extending profiling to multi-modal data and joint author-recipient analyses. Overall, the paper opens a new line of inquiry into how conversational text can reveal recipient attributes and what safeguards may be needed in real-world communications.

Abstract

It has been shown in the field of Author Profiling that texts may inadvertently reveal sensitive information about their authors, such as gender or age. This raises important privacy concerns that have been extensively addressed in the literature, in particular with the development of methods to hide such information. We argue that, when these texts are in fact messages exchanged between individuals, this is not the end of the story. Indeed, in this case, a second party, the intended recipient, is also involved and should be considered. In this work, we investigate the potential privacy leaks affecting them, that is we propose and address the problem of Recipient Profiling. We provide empirical evidence that such a task is feasible on several publicly accessible datasets (https://huggingface.co/datasets/sileod/recipient_profiling). Furthermore, we show that the learned models can be transferred to other datasets, albeit with a loss in accuracy.

Recipient Profiling: Predicting Characteristics from Messages

TL;DR

This work formalizes Recipient Profiling, a task that predicts a recipient's sensitive attributes from messages, highlighting a previously overlooked privacy risk in text data. Using three transformer-based encoders (BERT, MPNet, DeBERTa) across three dialogue-focused datasets (SWDA, MDC, TIC), the authors demonstrate better-than-chance gender prediction for recipients and show partial cross-dataset transferability. They also analyze gender-driven accuracy differences and model agreement, revealing complementary patterns across models. The study argues for further work on explainability and privacy mitigation, and suggests extending profiling to multi-modal data and joint author-recipient analyses. Overall, the paper opens a new line of inquiry into how conversational text can reveal recipient attributes and what safeguards may be needed in real-world communications.

Abstract

It has been shown in the field of Author Profiling that texts may inadvertently reveal sensitive information about their authors, such as gender or age. This raises important privacy concerns that have been extensively addressed in the literature, in particular with the development of methods to hide such information. We argue that, when these texts are in fact messages exchanged between individuals, this is not the end of the story. Indeed, in this case, a second party, the intended recipient, is also involved and should be considered. In this work, we investigate the potential privacy leaks affecting them, that is we propose and address the problem of Recipient Profiling. We provide empirical evidence that such a task is feasible on several publicly accessible datasets (https://huggingface.co/datasets/sileod/recipient_profiling). Furthermore, we show that the learned models can be transferred to other datasets, albeit with a loss in accuracy.

Paper Structure

This paper contains 23 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Performance of fine-tuned models for recipient gender classification in terms of balanced accuracy. The barplot shows the performance of each model when trained and tested within the same domain. The error bars represent the standard deviation of the measurements, calculated after running each model with three different seeds.
  • Figure 2: Balanced accuracy transfer performance of fine-tuned models for recipient gender classification. The values represent the mean balanced accuracy, over three seeds for each model, with their respective standard deviations.
  • Figure 3: Difference in accuracy of fine-tuned models when predicting recipient gender. The values represent the average accuracy across three seeds. The models were train and test with in the same domain.
  • Figure 4: Agreement between fine-tuned models for recipient profiling, measured using the Kappa Coefficient to account for random agreement, as defined in Equation (\ref{['eq:kappa']}). The fine-tuned models were evaluated within the same domain. Each agreement value represents the average coefficient across three seeds used in the experiments. The agreement values for different models range from 0.46 to 0.61.