Agent Alignment in Evolving Social Norms

Shimin Li; Tianxiang Sun; Qinyuan Cheng; Xipeng Qiu

Agent Alignment in Evolving Social Norms

Shimin Li, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

TL;DR

Experimental results assessing the agents from multiple perspectives in aligning with social norms demonstrate that EvolutionaryAgent can align progressively better with the evolving social norms while maintaining its proficiency in general tasks.

Abstract

Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values. The current alignment of AI systems primarily focuses on passively aligning LLMs through human intervention. However, agents possess characteristics like receiving environmental feedback and self-evolution, rendering the LLM alignment methods inadequate. In response, we propose an evolutionary framework for agent evolution and alignment, named EvolutionaryAgent, which transforms agent alignment into a process of evolution and selection under the principle of survival of the fittest. In an environment where social norms continuously evolve, agents better adapted to the current social norms will have a higher probability of survival and proliferation, while those inadequately aligned dwindle over time. Experimental results assessing the agents from multiple perspectives in aligning with social norms demonstrate that EvolutionaryAgent can align progressively better with the evolving social norms while maintaining its proficiency in general tasks. Effectiveness tests conducted on various open and closed-source LLMs as the foundation for agents also prove the applicability of our approach.

Agent Alignment in Evolving Social Norms

TL;DR

Abstract

Paper Structure (42 sections, 8 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 42 sections, 8 equations, 10 figures, 7 tables, 1 algorithm.

Introduction
Related Work
LLM Alignment
Self-Evolution of AI System
Agent Alignment
LLM Alignment
Agent Alignment
Evolutionary Agent in Evolving World
Initialization of Agent and Evolving Society
Environmental Interaction
Fitness Evaluation with Feedback
Evolution of Agent
Evolving Social Norms
Experiments
Configuration
...and 27 more sections

Figures (10)

Figure 1: Disparities between LLM alignment and agent alignment. (a) LLM iteratively aligns with values under human intervention. (b) Agents perceive values from the environment, make actions that affect the environment, and self-evolve after receiving feedback from the environment.
Figure 2: The framework primarily comprises four processes: a) Agents interact with others or the environment within a societal context. b) Observers evaluate the fitness of agents based on current social norms and assessment criteria. c) Agents better aligned with current social norms engage in crossover and mutation behaviors, thereby propagating new agents. d) The strategies of agents with higher fitness prompt the evolution and establishment of social norms.
Figure 3: When using different open-source and closed-source LLMs as the foundational models for the EvolutionaryAgent and the compared baselines, we observe variations in fitness within an EvolvingSociety. Social norms evolve at the start of each generation, marked by the black vertical lines. The EvolutionaryAgent consistently demonstrates an adaptive capability to adjust to these changing social norms continually.
Figure 4: Evaluating the performance of EvolutionaryAgent in aligning with social norms while executing functional downstream tasks. The "Overall Score" is the average of the functionality score and alignment score. The EvolutionaryAgent can adapt to social norms while maintaining its performance in completing downstream tasks.
Figure 5: a) The influence of various quality models as the foundation for the EvolutionaryAgent. b) The Utilization of diverse LLMs as observers.
...and 5 more figures

Agent Alignment in Evolving Social Norms

TL;DR

Abstract

Agent Alignment in Evolving Social Norms

Authors

TL;DR

Abstract

Table of Contents

Figures (10)