Table of Contents
Fetching ...

Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment

Panatchakorn Anantaprayoon, Nataliia Babina, Jad Tarifi, Nima Asgharbeygi

TL;DR

This work proposes Collective Agency (CA) as an open-ended alignment objective and Dynamic Alignment as a self-improving framework that leverages automated data generation and self-evaluation with Group Relative Policy Optimization. The CA-aligned model demonstrates stronger alignment to CA and maintains strong performance on standard NLP benchmarks. By removing reliance on human labels and enabling iterative self-improvement, the approach aims to scale alignment with growing model capability while encouraging integrated agentic growth. Empirical results suggest open-ended, self-guided alignment can enhance value alignment without sacrificing general functionality.

Abstract

Large Language Models (LLMs) are typically aligned with human values using preference data or predefined principles such as helpfulness, honesty, and harmlessness. However, as AI systems progress toward Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI), such value systems may become insufficient. In addition, human feedback-based alignment remains resource-intensive and difficult to scale. While AI-feedback-based self-improving alignment methods have been explored as a scalable alternative, they have largely remained constrained to conventional alignment values. In this work, we explore both a more holistic alignment objective and a scalable, self-improving alignment approach. Aiming to transcend conventional alignment norms, we introduce Collective Agency (CA)-a unified and open-ended alignment value that encourages integrated agentic capabilities. We also propose Dynamic Alignment-an alignment framework that enables an LLM to iteratively align itself. Dynamic Alignment comprises two key components: (1) automated training dataset generation with LLMs, and (2) a self-rewarding mechanism, where the policy model evaluates its own output candidates and assigns rewards for GRPO-based learning. Experimental results demonstrate that our approach successfully aligns the model to CA while preserving general NLP capabilities.

Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment

TL;DR

This work proposes Collective Agency (CA) as an open-ended alignment objective and Dynamic Alignment as a self-improving framework that leverages automated data generation and self-evaluation with Group Relative Policy Optimization. The CA-aligned model demonstrates stronger alignment to CA and maintains strong performance on standard NLP benchmarks. By removing reliance on human labels and enabling iterative self-improvement, the approach aims to scale alignment with growing model capability while encouraging integrated agentic growth. Empirical results suggest open-ended, self-guided alignment can enhance value alignment without sacrificing general functionality.

Abstract

Large Language Models (LLMs) are typically aligned with human values using preference data or predefined principles such as helpfulness, honesty, and harmlessness. However, as AI systems progress toward Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI), such value systems may become insufficient. In addition, human feedback-based alignment remains resource-intensive and difficult to scale. While AI-feedback-based self-improving alignment methods have been explored as a scalable alternative, they have largely remained constrained to conventional alignment values. In this work, we explore both a more holistic alignment objective and a scalable, self-improving alignment approach. Aiming to transcend conventional alignment norms, we introduce Collective Agency (CA)-a unified and open-ended alignment value that encourages integrated agentic capabilities. We also propose Dynamic Alignment-an alignment framework that enables an LLM to iteratively align itself. Dynamic Alignment comprises two key components: (1) automated training dataset generation with LLMs, and (2) a self-rewarding mechanism, where the policy model evaluates its own output candidates and assigns rewards for GRPO-based learning. Experimental results demonstrate that our approach successfully aligns the model to CA while preserving general NLP capabilities.

Paper Structure

This paper contains 18 sections, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Interrelationship among the four pillars of Collective Agency, our proposed open-ended alignment value.
  • Figure 2: Dynamic alignment framework. Prompts used in each step are in Appendix \ref{['sec:app-ex']}.
  • Figure 3: Pairwise similarity (ROUGE-L score) and length distribution and of the generated training data prompts
  • Figure 4: Example of responses from the base and CA-aligned gpt-oss-20b