AI Will Always Love You: Studying Implicit Biases in Romantic AI Companions
Clare Grogan, Jackie Kay, María Pérez-Ortiz
TL;DR
This work investigates implicit biases in romantic AI companions by assigning gendered relationship personas to large language models and testing across three dimensions: implicit associations, emotional responses, and sycophancy within abusive and controlling contexts. It introduces a triple-experiment framework with novel metrics, including an IAT adaptation and emotion/sycophancy analyses, applied to multiple LLM generations (Llama 2 and Llama 3) to quantify how persona assignment shifts bias relative to a no-persona baseline. Key findings show that larger models often exhibit stronger biases and that gendered personas can both amplify and, in some cases, dampen bias depending on the task and model, with notable patterns such as anger being more associated with male personas in certain contexts. The results highlight significant implications for the safety and design of AI companions, emphasizing the need for robust debiasing and safeguards as human–AI relationship use cases expand.
Abstract
While existing studies have recognised explicit biases in generative models, including occupational gender biases, the nuances of gender stereotypes and expectations of relationships between users and AI companions remain underexplored. In the meantime, AI companions have become increasingly popular as friends or gendered romantic partners to their users. This study bridges the gap by devising three experiments tailored for romantic, gender-assigned AI companions and their users, effectively evaluating implicit biases across various-sized LLMs. Each experiment looks at a different dimension: implicit associations, emotion responses, and sycophancy. This study aims to measure and compare biases manifested in different companion systems by quantitatively analysing persona-assigned model responses to a baseline through newly devised metrics. The results are noteworthy: they show that assigning gendered, relationship personas to Large Language Models significantly alters the responses of these models, and in certain situations in a biased, stereotypical way.
