Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

Zahra Zahedi; Xinyue Hu; Shashank Mehrotra; Mark Steyvers; Kumar Akash

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

Zahra Zahedi, Xinyue Hu, Shashank Mehrotra, Mark Steyvers, Kumar Akash

TL;DR

A decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions is proposed, and the resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes.

Abstract

We propose a decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions. Modeling the human's prosociality as a latent state that evolves over time, the robot learns to infer and influence this state through its own actions, including helping and signaling. We formalize this as a latent-state POMDP with limited observations and learn the transition and observation dynamics using expectation maximization. The resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes. We evaluate the model using data from user studies and show that the learned policy outperforms baseline strategies in both team performance and increasing observed human cooperative behavior.

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

TL;DR

Abstract

Paper Structure (14 sections, 10 equations, 5 figures, 2 tables)

This paper contains 14 sections, 10 equations, 5 figures, 2 tables.

Introduction
Related Work
Model
Problem Setting
Model Formulation
Model Learning
Planning and Decision-Making
Experiment
User Study
Model Implementation
Evaluation
Policy Evaluation and Baseline Comparison
Sensitivity Analysis
Discussion and Conclusion

Figures (5)

Figure 1: ls-POMDP model framework: the robot learns prosocial state dynamics via EM and plans actions using belief-space decision-making to strategically shape inferred human prosociality over repeated interactions.
Figure 2: An overview of the game setup, different modes $M_k$ and signaling as robot action.
Figure 3: State reward values under various reward gradient $r$, illustrating how the reward increases with prosociality under different settings.
Figure 4: Policy comparison over 9 interaction steps. In The Cumulative Tokens, Never Help / Never Signal considered as baseline factor.
Figure 5: Sensitivity of Robot Policy to Reward and Cost Parameters.

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

TL;DR

Abstract

Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (5)