Benchmarking LLM Privacy Recognition for Social Robot Decision Making
Dakota Sullivan, Shirley Zhang, Jennica Li, Heather Kirkorian, Bilge Mutlu, Kassem Fawaz
TL;DR
This paper tackles privacy in home-based human-robot interaction by grounding scenarios in Contextual Integrity and surveying user privacy orientations (N=450). It then benchmarks ten LLMs under multiple prompting strategies to assess alignment with human privacy preferences, revealing that while LLMs default to privacy-preserving choices, they often miss nuanced privacy reasoning. Few-shot prompting emerges as a strong method to improve conformity, albeit with residual gaps, suggesting LLMs can act as privacy controllers for robots but require careful design and potential local-model deployment. The findings highlight practical implications for signaling data collection, adapting robot behavior to individual privacy needs, and guiding future development of privacy-aware robotic systems. Overall, the work provides a structured pathway to integrate human-centric privacy controls into LLM-powered robots, with significant implications for user trust and deployment in private environments.
Abstract
While robots have previously utilized rule-based systems or probabilistic models for user interaction, the rapid evolution of large language models (LLMs) presents new opportunities to develop LLM-powered robots for enhanced human-robot interaction (HRI). To fully realize these capabilities, however, robots need to collect data such as audio, fine-grained images, video, and locations. As a result, LLMs often process sensitive personal information, particularly within private environments, such as homes. Given the tension between utility and privacy risks, evaluating how current LLMs manage sensitive data is critical. Specifically, we aim to explore the extent to which out-of-the-box LLMs are privacy-aware in the context of household robots. In this work, we present a set of privacy-relevant scenarios developed using the Contextual Integrity (CI) framework. We first surveyed users' privacy preferences regarding in-home robot behaviors and then examined how their privacy orientations affected their choices of these behaviors (N = 450). We then provided the same set of scenarios and questions to state-of-the-art LLMs (N = 10) and found that the agreement between humans and LLMs was generally low. To further investigate the capabilities of LLMs as potential privacy controllers, we implemented four additional prompting strategies and compared their results. We discuss the performance of the evaluated models as well as the implications and potential of AI privacy awareness in human-robot interaction.
