Recipient Profiling: Predicting Characteristics from Messages
Martin Borquez, Mikaela Keller, Michael Perrot, Damien Sileo
TL;DR
This work formalizes Recipient Profiling, a task that predicts a recipient's sensitive attributes from messages, highlighting a previously overlooked privacy risk in text data. Using three transformer-based encoders (BERT, MPNet, DeBERTa) across three dialogue-focused datasets (SWDA, MDC, TIC), the authors demonstrate better-than-chance gender prediction for recipients and show partial cross-dataset transferability. They also analyze gender-driven accuracy differences and model agreement, revealing complementary patterns across models. The study argues for further work on explainability and privacy mitigation, and suggests extending profiling to multi-modal data and joint author-recipient analyses. Overall, the paper opens a new line of inquiry into how conversational text can reveal recipient attributes and what safeguards may be needed in real-world communications.
Abstract
It has been shown in the field of Author Profiling that texts may inadvertently reveal sensitive information about their authors, such as gender or age. This raises important privacy concerns that have been extensively addressed in the literature, in particular with the development of methods to hide such information. We argue that, when these texts are in fact messages exchanged between individuals, this is not the end of the story. Indeed, in this case, a second party, the intended recipient, is also involved and should be considered. In this work, we investigate the potential privacy leaks affecting them, that is we propose and address the problem of Recipient Profiling. We provide empirical evidence that such a task is feasible on several publicly accessible datasets (https://huggingface.co/datasets/sileod/recipient_profiling). Furthermore, we show that the learned models can be transferred to other datasets, albeit with a loss in accuracy.
