Table of Contents
Fetching ...

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

Zahra Zahedi, Xinyue Hu, Shashank Mehrotra, Mark Steyvers, Kumar Akash

TL;DR

A decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions is proposed, and the resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes.

Abstract

We propose a decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions. Modeling the human's prosociality as a latent state that evolves over time, the robot learns to infer and influence this state through its own actions, including helping and signaling. We formalize this as a latent-state POMDP with limited observations and learn the transition and observation dynamics using expectation maximization. The resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes. We evaluate the model using data from user studies and show that the learned policy outperforms baseline strategies in both team performance and increasing observed human cooperative behavior.

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

TL;DR

A decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions is proposed, and the resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes.

Abstract

We propose a decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions. Modeling the human's prosociality as a latent state that evolves over time, the robot learns to infer and influence this state through its own actions, including helping and signaling. We formalize this as a latent-state POMDP with limited observations and learn the transition and observation dynamics using expectation maximization. The resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes. We evaluate the model using data from user studies and show that the learned policy outperforms baseline strategies in both team performance and increasing observed human cooperative behavior.
Paper Structure (14 sections, 10 equations, 5 figures, 2 tables)

This paper contains 14 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: ls-POMDP model framework: the robot learns prosocial state dynamics via EM and plans actions using belief-space decision-making to strategically shape inferred human prosociality over repeated interactions.
  • Figure 2: An overview of the game setup, different modes $M_k$ and signaling as robot action.
  • Figure 3: State reward values under various reward gradient $r$, illustrating how the reward increases with prosociality under different settings.
  • Figure 4: Policy comparison over 9 interaction steps. In The Cumulative Tokens, Never Help / Never Signal considered as baseline factor.
  • Figure 5: Sensitivity of Robot Policy to Reward and Cost Parameters.