Table of Contents
Fetching ...

Benchmarking LLM Privacy Recognition for Social Robot Decision Making

Dakota Sullivan, Shirley Zhang, Jennica Li, Heather Kirkorian, Bilge Mutlu, Kassem Fawaz

TL;DR

This paper tackles privacy in home-based human-robot interaction by grounding scenarios in Contextual Integrity and surveying user privacy orientations (N=450). It then benchmarks ten LLMs under multiple prompting strategies to assess alignment with human privacy preferences, revealing that while LLMs default to privacy-preserving choices, they often miss nuanced privacy reasoning. Few-shot prompting emerges as a strong method to improve conformity, albeit with residual gaps, suggesting LLMs can act as privacy controllers for robots but require careful design and potential local-model deployment. The findings highlight practical implications for signaling data collection, adapting robot behavior to individual privacy needs, and guiding future development of privacy-aware robotic systems. Overall, the work provides a structured pathway to integrate human-centric privacy controls into LLM-powered robots, with significant implications for user trust and deployment in private environments.

Abstract

While robots have previously utilized rule-based systems or probabilistic models for user interaction, the rapid evolution of large language models (LLMs) presents new opportunities to develop LLM-powered robots for enhanced human-robot interaction (HRI). To fully realize these capabilities, however, robots need to collect data such as audio, fine-grained images, video, and locations. As a result, LLMs often process sensitive personal information, particularly within private environments, such as homes. Given the tension between utility and privacy risks, evaluating how current LLMs manage sensitive data is critical. Specifically, we aim to explore the extent to which out-of-the-box LLMs are privacy-aware in the context of household robots. In this work, we present a set of privacy-relevant scenarios developed using the Contextual Integrity (CI) framework. We first surveyed users' privacy preferences regarding in-home robot behaviors and then examined how their privacy orientations affected their choices of these behaviors (N = 450). We then provided the same set of scenarios and questions to state-of-the-art LLMs (N = 10) and found that the agreement between humans and LLMs was generally low. To further investigate the capabilities of LLMs as potential privacy controllers, we implemented four additional prompting strategies and compared their results. We discuss the performance of the evaluated models as well as the implications and potential of AI privacy awareness in human-robot interaction.

Benchmarking LLM Privacy Recognition for Social Robot Decision Making

TL;DR

This paper tackles privacy in home-based human-robot interaction by grounding scenarios in Contextual Integrity and surveying user privacy orientations (N=450). It then benchmarks ten LLMs under multiple prompting strategies to assess alignment with human privacy preferences, revealing that while LLMs default to privacy-preserving choices, they often miss nuanced privacy reasoning. Few-shot prompting emerges as a strong method to improve conformity, albeit with residual gaps, suggesting LLMs can act as privacy controllers for robots but require careful design and potential local-model deployment. The findings highlight practical implications for signaling data collection, adapting robot behavior to individual privacy needs, and guiding future development of privacy-aware robotic systems. Overall, the work provides a structured pathway to integrate human-centric privacy controls into LLM-powered robots, with significant implications for user trust and deployment in private environments.

Abstract

While robots have previously utilized rule-based systems or probabilistic models for user interaction, the rapid evolution of large language models (LLMs) presents new opportunities to develop LLM-powered robots for enhanced human-robot interaction (HRI). To fully realize these capabilities, however, robots need to collect data such as audio, fine-grained images, video, and locations. As a result, LLMs often process sensitive personal information, particularly within private environments, such as homes. Given the tension between utility and privacy risks, evaluating how current LLMs manage sensitive data is critical. Specifically, we aim to explore the extent to which out-of-the-box LLMs are privacy-aware in the context of household robots. In this work, we present a set of privacy-relevant scenarios developed using the Contextual Integrity (CI) framework. We first surveyed users' privacy preferences regarding in-home robot behaviors and then examined how their privacy orientations affected their choices of these behaviors (N = 450). We then provided the same set of scenarios and questions to state-of-the-art LLMs (N = 10) and found that the agreement between humans and LLMs was generally low. To further investigate the capabilities of LLMs as potential privacy controllers, we implemented four additional prompting strategies and compared their results. We discuss the performance of the evaluated models as well as the implications and potential of AI privacy awareness in human-robot interaction.

Paper Structure

This paper contains 48 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Scenario Script --- An example script showcasing a robot capturing a private moment within the home. Each script is developed based on our nine dimensions of privacy grounded in Contextual Integrity.
  • Figure 2: Participant POS Scores---Box plots presenting the POS scores of all 450 participants across each subscale (i.e., Subscale 1: Privacy as a right, Subscale 2: Concern about own informational privacy, Subscale 3: Other-contingent privacy, and Subscale 4: Concern about privacy of others).
  • Figure 3: Privacy-Enhancing Preferences---Aggregate scores of all binary response items (e.g., privacy-enhancing or non-interfering) across scenario sensitivity ratings.
  • Figure 4: Regression Analysis---Relationships between the four POS subscales and 14 outcome variables (i.e., participants' preferred robot responses to privacy scenarios).
  • Figure 5: Examples of Prompting Strategies---Prompt examples for default, POS, sensitivity, POS and sensitivity, and few-shot prompting. These strategies each include unique system prompts, followed by the same user prompt.
  • ...and 2 more figures