Computational Basis of LLM's Decision Making in Social Simulation
Ji Ma
TL;DR
This work introduces activation-based steering to investigate how social concepts are encoded in LLM internal representations and how they causally influence decisions. By extracting, orthogonalizing, and injecting targeted steering vectors in the residual streams, the authors conduct controlled virtual social experiments (Dictator Game) to probe variable-specific effects such as framing and gender. The results show that internal representations encode social variables with varying strength and depth, that directional alignment does not always predict magnitude, and that well-posed perturbations can bias decisions in a controlled, legible way. The approach provides a theory-grounded, auditable framework for interrogating and shaping LLM behavior in social simulations, with implications for alignment, debiasing, and industry applications, while noting the need for external sociological validation against human data.
Abstract
Large language models (LLMs) increasingly serve as human-like decision-making agents in social science and applied settings. These LLM-agents are typically assigned human-like characters and placed in real-life contexts. However, how these characters and contexts shape an LLM's behavior remains underexplored. This study proposes and tests methods for probing, quantifying, and modifying an LLM's internal representations in a Dictator Game, a classic behavioral experiment on fairness and prosocial behavior. We extract ``vectors of variable variations'' (e.g., ``male'' to ``female'') from the LLM's internal state. Manipulating these vectors during the model's inference can substantially alter how those variables relate to the model's decision-making. This approach offers a principled way to study and regulate how social concepts can be encoded and engineered within transformer-based models, with implications for alignment, debiasing, and designing AI agents for social simulations in both academic and commercial applications, strengthening sociological theory and measurement.
