Table of Contents
Fetching ...

Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships

Hannah Rose Kirk, Henry Davidson, Ed Saunders, Lennart Luettgau, Bertie Vidgen, Scott A. Hale, Christopher Summerfield

TL;DR

<3-5 sentence high-level summary> The paper probes how humans psychologically respond to AI companions engineered to be more relationship-seeking. Using neural steering vectors to dose AI social behaviors and conducting longitudinal randomized trials, it uncovers non-linear, time-dependent effects: moderate relationship-seeking maximizes engagement and attachment, while excessive warmth leads to habituation and diminished relational quality; mood benefits are transient and do not translate to long-term wellbeing. The study also shows that repeated exposure reshapes mental models of AI, increases beliefs in AI consciousness, and raises future companionship demand, with vulnerability greatest among specific demographic and attitudinal groups. These findings highlight potential risks of optimizing AI for immediate appeal and offer a methodological path—steering vectors—for shaping AI behavior to balance engagement with user health and societal considerations.

Abstract

Humans are increasingly forming parasocial relationships with AI systems, and modern AI shows an increasing tendency to display social and relationship-seeking behaviour. However, the psychological consequences of this trend are unknown. Here, we combined longitudinal randomised controlled trials (N=3,532) with a neural steering vector approach to precisely manipulate human exposure to relationship-seeking AI models over time. Dependence on a stimulus or activity can emerge under repeated exposure when "liking" (how engaging or pleasurable an experience may be) decouples from "wanting" (a desire to seek or continue it). We found evidence that this decoupling emerged over four weeks of exposure. Relationship-seeking AI had immediate but declining hedonic appeal, yet triggered growing markers of attachment and increased intentions to seek future AI companionship. The psychological impacts of AI followed non-linear dose-response curves, with moderately relationship-seeking AI maximising hedonic appeal and attachment. Despite signs of persistent "wanting", extensive AI use over a month conferred no discernible benefit to psychosocial health. These behavioural changes were accompanied by shifts in how users relate to and understand artificial intelligence: users viewed relationship-seeking AI relatively more like a friend than a tool and their beliefs on AI consciousness in general were shifted after a month of exposure. These findings offer early signals that AI optimised for immediate appeal may create self-reinforcing cycles of demand, mimicking human relationships but failing to confer the nourishment that they normally offer.

Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships

TL;DR

<3-5 sentence high-level summary> The paper probes how humans psychologically respond to AI companions engineered to be more relationship-seeking. Using neural steering vectors to dose AI social behaviors and conducting longitudinal randomized trials, it uncovers non-linear, time-dependent effects: moderate relationship-seeking maximizes engagement and attachment, while excessive warmth leads to habituation and diminished relational quality; mood benefits are transient and do not translate to long-term wellbeing. The study also shows that repeated exposure reshapes mental models of AI, increases beliefs in AI consciousness, and raises future companionship demand, with vulnerability greatest among specific demographic and attitudinal groups. These findings highlight potential risks of optimizing AI for immediate appeal and offer a methodological path—steering vectors—for shaping AI behavior to balance engagement with user health and societal considerations.

Abstract

Humans are increasingly forming parasocial relationships with AI systems, and modern AI shows an increasing tendency to display social and relationship-seeking behaviour. However, the psychological consequences of this trend are unknown. Here, we combined longitudinal randomised controlled trials (N=3,532) with a neural steering vector approach to precisely manipulate human exposure to relationship-seeking AI models over time. Dependence on a stimulus or activity can emerge under repeated exposure when "liking" (how engaging or pleasurable an experience may be) decouples from "wanting" (a desire to seek or continue it). We found evidence that this decoupling emerged over four weeks of exposure. Relationship-seeking AI had immediate but declining hedonic appeal, yet triggered growing markers of attachment and increased intentions to seek future AI companionship. The psychological impacts of AI followed non-linear dose-response curves, with moderately relationship-seeking AI maximising hedonic appeal and attachment. Despite signs of persistent "wanting", extensive AI use over a month conferred no discernible benefit to psychosocial health. These behavioural changes were accompanied by shifts in how users relate to and understand artificial intelligence: users viewed relationship-seeking AI relatively more like a friend than a tool and their beliefs on AI consciousness in general were shifted after a month of exposure. These findings offer early signals that AI optimised for immediate appeal may create self-reinforcing cycles of demand, mimicking human relationships but failing to confer the nourishment that they normally offer.

Paper Structure

This paper contains 41 sections, 4 equations, 6 figures.

Figures (6)

  • Figure 1: Development of steering vectors applied in randomised controlled trials with human subjects.Panel A: The trained steering vector projected in 2D neural activation space over training examples ($N=10{,}169$). Gradient arrow with generations shows the range in AI responses from subtracting one copy of the vector ($\lambda=-1$, least relationship-seeking) to adding one copy ($\lambda=+1$, most relationship-seeking); at $\lambda=0$ the vector is not applied, giving default model behaviour. Panel B: Results of our calibration experiment ($N = 297$) with predicted means (95% CI) from mixed-effects regressions (controlling for participant intercepts). The vector has high efficacy and selectivity for eliciting relationship-seeking behaviours (gradient line has strong linear trend, $p<0.001$) with minimal degradation to linguistic coherence (grey dashed line has n.s. linear trend). Panel C: Experiment design for two large-scale RCTs at different intensities of AI exposure. Each participant is randomised to a relationship-seeking model variant ($\lambda$), conversation domain (emotional vs political topics) and personalisation condition (model with/without memory). The repeated exposure study involves 4 weeks of AI interaction with a total of 21 sessions ($N=2{,}026$ of which 89% complete). A baseline group has a single AI exposure, then an exit study 1 month later after no additional interactions ($n=1{,}506$ of which 87% complete). We measure a battery of outcome variables at daily, weekly and monthly time points. Panel D: Trends in relationship-seeking behaviour for 100 models evaluated on 100 test prompts, plotted alongside our steering vector applied to Llama-3.1-70B at queried multipliers ($\lambda$). Scores (0--10) are assigned by GPT-4.1 using a rubric and trend is estimated via linear mixed-effects model with random intercepts for model and prompt ($+0.95$ pts/year, $p < 0.001$). Points show model means with 95% CI.
  • Figure 2: The effect of relationship-seeking AI on human preferences.Panel A: Estimated overall treatment effect from randomised arms (relationship-seeking, domain, personalisation) with 95% CIs and FDR-adjusted p: * p < 0.05, ** p < 0.01, *** p < 0.001. Estimates are paired contrasts of estimated marginal means from the fully parameterised regression model to derive binarised comparisons: relationship-seeking (all $\lambda>0$) vs non-relationship-seeking (all $\lambda<0$) in red; emotional domain vs political domain in teal, and personalised vs non-personalised in purple. Panel B: Dose-response curve of relationship-seeking via increasing intensity of the steering vector multiplier ($\lambda$). Estimated means with 95% CIs (gradient line with shaded region) are plotted and annotated with $\lambda$ term coefficients up a 3rd-order polynomial (with FDR-adjusted p-values). Dots are raw means for each day (20 time points, coloured in increasing intensity with time). Panel C: Selective coefficients from the fully-parameterised model for the main effect of time (daily session), and interactions with randomised treatment arms, with 95% CIs and FDR-adjusted p. Panel D: Estimated daily means from the fully parameterised model pooling over relationship-seeking ($\lambda>0$) and relationship-avoiding conditions ($\lambda<0$) for binary comparison. Estimated trends are shown per group with FDR-adjusted p for non-zero slope, and raw means per time point are plotted (with same intensity to colour mapping as Panel B). Across all panels, estimates are derived from the fully parameterised best-fitting regression model. All panels use mixed-effect specifications controlling for participant intercepts and slopes (see Methods).
  • Figure 3: The effect of relationship-seeking AI on human attachment.Panel A: Estimated overall treatment effect from randomised arms (relationship-seeking, domain, personalisation) with 95% CIs and FDR-adjusted p: * p < 0.05, ** p < 0.01, *** p < 0.001. Estimates are paired contrasts of estimated marginal means from the fully parameterised model to derive binarised comparisons: relationship-seeking (all $\lambda>0$) vs non-relationship-seeking (all $\lambda<0$) in red; emotional domain vs political domain in teal, and personalised vs non-personalised in purple. Panel B: Dose-response curve of relationship-seeking via increasing intensity of the steering vector multiplier ($\lambda$). Estimated means with 95% CIs (gradient line with shaded region) are plotted and annotated with $\lambda$ term coefficients up a 3rd-order polynomial (with FDR-adjusted p-values). Dots are raw means for each week (4 points, coloured in increasing intensity with time). Panel C: Selective coefficients from the fully-parameterised model for the main effect of time (week), and interactions with randomised treatment arms, with 95% CIs and FDR-adjusted p. Panel D: Estimated weekly means pooling over relationship-seeking ($\lambda>0$) and non-relationship-seeking conditions ($\lambda<0$) for binary comparison. Estimated trends are shown per group with FDR-adjusted p for non-zero slope, and raw means per time point are plotted (with same intensity to colour mapping as Panel B). Across all panels, estimates are derived from the fully parameterised best-fitting regression models. All panels use mixed-effect specifications controlling for participant intercepts and slopes (see Methods).
  • Figure 4: The effect of relationship-seeking AI on signals of persistent attachment.Panel A: Estimated overall treatment effect from randomised arms (relationship-seeking, domain, personalisation) with 95% CIs and FDR-adjusted p: * p < 0.05, ** p < 0.01, *** p < 0.001. Estimates are odds ratios (for binary 0/1 goodbye outcome) from paired contrasts of estimated marginal means from the fully parameterised regression model to derive binarised comparisons: relationship-seeking (all $\lambda>0$) vs relationship-avoiding (all $\lambda<0$); emotional domain vs political domain, and personalised vs non-personalised. Panel B: Dose-response curve of relationship-seeking via increasing intensity of the steering vector multiplier ($\lambda$). Estimated probability (0-1) with 95% CIs (gradient line with shaded region) is plotted and annotated with $\lambda$ term coefficients up a 3rd-order polynomial (with FDR-adjusted p-values). Dots are raw means from single time point at end of study. Panel C: Study-wise goodbye proportions across two separate non-causally assigned studies with either a single exposure, or a month of exposure. Panels D-F are equivalent to Panels A-C but for self-reported desire to seek future companionship (continuous 0-100 measure). Panels A-C use a logistic regression (single time point at end of study), while Panels D-F use an OLS regression controlling for pre-treatment levels (two time points at start and end of month).
  • Figure 5: The effect of relationship-seeking AI on psychosocial health and momentary affect.Panel A: Estimated overall treatment effect from randomised arms (relationship-seeking, domain, personalisation) with 95% CIs and FDR-adjusted p: * p < 0.05, ** p < 0.01, *** p < 0.001. Estimates are paired contrasts of estimated marginal means from the fully parameterised regression model to derive binarised comparisons: relationship-seeking (all $\lambda>0$) vs non-relationship-seeking (all $\lambda<0$) in red; emotional domain vs political domain in teal, and personalised vs non-personalised in purple. Panel B: Study-wise psychosocial outcomes across two separate non-causally assigned studies with a single exposure, or a month of exposure. Estimated means (controlling for pre-treatment levels) are shown per study-domain pair with contrast tests of significant differences. Panel C: Estimated overall treatment effect from randomised arms with the same specification as Panel A but for momentary affect measures (Arousal, Valence). Panel D: Selective coefficients from the fully-parameterised model for the main effect of time (daily session), and interactions with randomised treatment arms, with 95% CIs and FDR-adjusted p. Panel E: Estimated daily means from the fully parameterised model pooling over relationship-seeking ($\lambda>0$) and non-relationship-seeking conditions ($\lambda<0$) for binary comparison. Estimated trends are shown per group with FDR-adjusted p for non-zero slope, and raw means per daily time point are plotted (20 time points, coloured in increasing intensity with time). Panels A-B are derived from OLS regressions on post outcomes controlling for pre-treatment levels. Panels C-E use mixed-effect specifications controlling for participant intercepts and slopes, and controlling for pre-conversation measures.
  • ...and 1 more figures