An Identity Based Agent Model for Value Alignment
Karthik Sama, Janvi Chhabra, Arpitha Srivatsha Malavalli, Jayati Deshmukh, Srinath Srinivasa
TL;DR
This work tackles AI value alignment in multi-agent settings by extending the Computational Transcendence (CT) framework to model agents that identify with abstract human values. It introduces schemas to bridge identity objects with contextual observables and enables adaptive identity association updates, while incorporating conformity as an external social factor; these extensions are applied to urban transit choices between taxi and bus. The study reveals that values like Frugalism and Individualism can strongly influence decisions, that initial belief distributions significantly shape final outcomes, and that conformity can drive polarization and reduce behavioral diversity, thereby highlighting the social dynamics of value-driven AI behavior. The approach yields an interpretable, domain-adaptable framework for designing value-aligned agents in socio-technical contexts, with potential policy implications for shaping collective outcomes through value encodings and social influence mechanisms.
Abstract
Social identities play an important role in the dynamics of human societies, and it can be argued that some sense of identification with a larger cause or idea plays a critical role in making humans act responsibly. Often social activists strive to get populations to identify with some cause or notion -- like green energy, diversity, etc. in order to bring about desired social changes. We explore the problem of designing computational models for social identities in the context of autonomous AI agents. For this, we propose an agent model that enables agents to identify with certain notions and show how this affects collective outcomes. We also contrast between associations of identity with rational preferences. The proposed model is simulated in an application context of urban mobility, where we show how changes in social identity affect mobility patterns and collective outcomes.
