Table of Contents
Fetching ...

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.
Paper Structure (5 sections, 2 equations, 3 figures, 1 table)

This paper contains 5 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The diagram for Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA).
  • Figure 2: Rank-score function graph for the question $q_{8657}$: $f_A, f_B, f_C, f_D$ and $f_E$ refer to agent A, B, C, D and E w.r.t. Authority, Care, Fairness, Loyalty and Sanctity, respectively.
  • Figure 3: F1 BERTScore across 26 combinations under four CFA combination types (ASC, WSCDS, ARC, WRCDS) for the question $q_{8657}$ (ASC sorted in non-decreasing order in each model group).