Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games
Anjie Liu, Jianhong Wang, Haoxuan Li, Xu Chen, Jun Wang, Samuel Kaski, Mengyue Yang
TL;DR
This work frames human-AI collaboration as a game where multiple equilibria may diverge from human-desired outcomes. It introduces Structural Causal Games (SCGs) and a pre-policy intervention mechanism to steer agent policies toward the optimal Nash Equilibrium representing the human's goals, supported by an EM-like learning procedure. The authors demonstrate the method in gridworlds and dialogue with large language models, showing adaptability to diverse problems. The approach advances human-AI alignment by combining causal reasoning with strategic policy design, offering a plug-in intervention to improve real-world interactions. Practical impact lies in more reliable, interpretable control of AI assistants in critical domains such as manufacturing, healthcare, and decision-making.
Abstract
In human-AI interaction, a prominent goal is to attain human`s desirable outcome with the assistance of AI agents, which can be ideally delineated as a problem of seeking the optimal Nash Equilibrium that matches the human`s desirable outcome. However, reaching the outcome is usually challenging due to the existence of multiple Nash Equilibria that are related to the assisting task but do not correspond to the human`s desirable outcome. To tackle this issue, we employ a theoretical framework called structural causal game (SCG) to formalize the human-AI interactive process. Furthermore, we introduce a strategy referred to as pre-policy intervention on the SCG to steer AI agents towards attaining the human`s desirable outcome. In more detail, a pre-policy is learned as a generalized intervention to guide the agents` policy selection, under a transparent and interpretable procedure determined by the SCG. To make the framework practical, we propose a reinforcement learning-like algorithm to search out this pre-policy. The proposed algorithm is tested in both gridworld environments and realistic dialogue scenarios with large language models, demonstrating its adaptability in a broader class of problems and potential effectiveness in real-world situations.
