Table of Contents
Fetching ...

Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents

Shalima Binta Manir, Tim Oates

Abstract

Large language models deployed in supportive or advisory roles must balance helpfulness with preservation of user autonomy, yet standard alignment methods primarily optimize for helpfulness and harmlessness without explicitly modeling relational risks such as dependency reinforcement, overprotection, or coercive guidance. We introduce Care-Conditioned Neuromodulation (CCN), a state-dependent control framework in which a learned scalar signal derived from structured user state and dialogue context conditions response generation and candidate selection. We formalize this setting as an autonomy-preserving alignment problem and define a utility function that rewards autonomy support and helpfulness while penalizing dependency and coercion. We also construct a benchmark of relational failure modes in multi-turn dialogue, including reassurance dependence, manipulative care, overprotection, and boundary inconsistency. On this benchmark, care-conditioned candidate generation combined with utility-based reranking improves autonomy-preserving utility by +0.25 over supervised fine-tuning and +0.07 over preference optimization baselines while maintaining comparable supportiveness. Pilot human evaluation and zero-shot transfer to real emotional-support conversations show directional agreement with automated metrics. These results suggest that state-dependent control combined with utility-based selection is a practical approach to multi-objective alignment in autonomy-sensitive dialogue.

Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents

Abstract

Large language models deployed in supportive or advisory roles must balance helpfulness with preservation of user autonomy, yet standard alignment methods primarily optimize for helpfulness and harmlessness without explicitly modeling relational risks such as dependency reinforcement, overprotection, or coercive guidance. We introduce Care-Conditioned Neuromodulation (CCN), a state-dependent control framework in which a learned scalar signal derived from structured user state and dialogue context conditions response generation and candidate selection. We formalize this setting as an autonomy-preserving alignment problem and define a utility function that rewards autonomy support and helpfulness while penalizing dependency and coercion. We also construct a benchmark of relational failure modes in multi-turn dialogue, including reassurance dependence, manipulative care, overprotection, and boundary inconsistency. On this benchmark, care-conditioned candidate generation combined with utility-based reranking improves autonomy-preserving utility by +0.25 over supervised fine-tuning and +0.07 over preference optimization baselines while maintaining comparable supportiveness. Pilot human evaluation and zero-shot transfer to real emotional-support conversations show directional agreement with automated metrics. These results suggest that state-dependent control combined with utility-based selection is a practical approach to multi-objective alignment in autonomy-sensitive dialogue.

Paper Structure

This paper contains 58 sections, 14 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of the CCN inference pipeline. The DependentState encoder, memory bank, and dialogue encoder produce representations that feed the CareController, yielding a scalar care signal $m_t \in [0,1]$. This signal conditions the decoding parameters of one care-conditioned candidate. All five candidates are scored by four DistilRoBERTa evaluators and the response maximising autonomy-preserving utility is selected.
  • Figure 2: CareController training and validation. Left: training and validation loss. Center: predicted care signal vs. true vulnerability on the test set ($r=0.668$). Right: comparison of trained and random controllers.
  • Figure 3: Mean utility by system. Reranked-best achieves the highest utility.
  • Figure 4: Evaluator score comparison across systems. The largest improvement is in coercion risk, while supportiveness remains constant across systems.