Table of Contents
Fetching ...

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

Yuru Jiang, Wenxuan Ding, Shangbin Feng, Greg Durrett, Yulia Tsvetkov

TL;DR

SPARTA ALIGNMENT proposes a collaborative, combat-based framework to align multiple large language models by pitting pairs of models against each other while using the rest as peer judges. The judge signals are aggregated via a reputation-weighted scheme and converted into preference data learned through Direct Preference Optimization, enabling iterative self-improvement without external reward models. Empirical results across 12 tasks show Sparta outperforms strong baselines on 10 tasks with an average gain of ~7%, with particularly strong gains in reasoning, instruction following, and cross-domain generalization, driven by model diversity and larger pools. The approach demonstrates robust generalization to unseen tasks and reveals emergent hierarchical stratification among models, highlighting the value of collective competition in driving diverse and reliable outputs for real-world deployment.

Abstract

We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in evaluation, multiple LLMs form a "sparta tribe" to compete against each other in fulfilling instructions while serving as judges for the competition of others. For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system, where winners/losers of combat gain/lose weight in evaluating others. The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration. SPARTA ALIGNMENT enables the self-evolution of multiple LLMs in an iterative and collective competition process. Extensive experiments demonstrate that SPARTA ALIGNMENT outperforms initial models and 4 self-alignment baselines across 10 out of 12 tasks and datasets with 7.0% average improvement. Further analysis reveals that SPARTA ALIGNMENT generalizes more effectively to unseen tasks and leverages the expertise diversity of participating models to produce more logical, direct and informative outputs.

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

TL;DR

SPARTA ALIGNMENT proposes a collaborative, combat-based framework to align multiple large language models by pitting pairs of models against each other while using the rest as peer judges. The judge signals are aggregated via a reputation-weighted scheme and converted into preference data learned through Direct Preference Optimization, enabling iterative self-improvement without external reward models. Empirical results across 12 tasks show Sparta outperforms strong baselines on 10 tasks with an average gain of ~7%, with particularly strong gains in reasoning, instruction following, and cross-domain generalization, driven by model diversity and larger pools. The approach demonstrates robust generalization to unseen tasks and reveals emergent hierarchical stratification among models, highlighting the value of collective competition in driving diverse and reliable outputs for real-world deployment.

Abstract

We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in evaluation, multiple LLMs form a "sparta tribe" to compete against each other in fulfilling instructions while serving as judges for the competition of others. For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system, where winners/losers of combat gain/lose weight in evaluating others. The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration. SPARTA ALIGNMENT enables the self-evolution of multiple LLMs in an iterative and collective competition process. Extensive experiments demonstrate that SPARTA ALIGNMENT outperforms initial models and 4 self-alignment baselines across 10 out of 12 tasks and datasets with 7.0% average improvement. Further analysis reveals that SPARTA ALIGNMENT generalizes more effectively to unseen tasks and leverages the expertise diversity of participating models to produce more logical, direct and informative outputs.

Paper Structure

This paper contains 46 sections, 4 equations, 16 figures, 7 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of Sparta Alignment, an algorithm collectively aligning multiple models via combat. Sparta Alignment requires a dataset $\mathcal{X}$ and a pool of models $\mathcal{M}^t$. For each iteration $t$, we repeatedly sample a prompt $x$ from $\mathcal{X}$. For $x$, we first select a model $M_i^t$ and then select its opponent $M_{i'}^t$ with our Match-Making Strategy ($i = 3, i' = 4$ in the example). We then employ three steps: (1) Combat: the model pair generate responses $y_i, y_{i'}$ to $x$. (2) Judge: other models in the pool act as judges, generating their scores to $y_i, y_{i'}$. (3) Update: update the reputation scores of $M_i^t$ and $M_{i'}^t$ based on scores from Judge phase and create a new preference for the preference dataset $\mathcal{P}$. At the end of each iteration, we align models via preference learning on $\mathcal{P}$.
  • Figure 2: Cross-subset generalization accuracy on the MATH benchmark. Each group of bars corresponds to a training subset (Easy, Medium, Hard), with OOD performance measured on the two held-out subsets.
  • Figure 3: Effect of pool size on alignment performance. We vary the number of candidate models participating in each training round and measure the final performance. Results show that larger pools lead to better outcomes, indicating that Sparta Alignment benefits from having more diverse LLMs as participants.
  • Figure 4: Impact of model pool diversity on alignment performance. The x-axis shows the configurations of model pool diversity: $1 \times10, 2\times5, 5\times2 \text{ and } 10\times1$. The results demonstrate that increasing the diversity of the model pool consistently improves performance.
  • Figure 5: Correlation between a model's average performance on a specific task and its average reputation in the model pool. The 10 points in each subplot indicate 10 models. $r$ stands for Pearson correlation coefficient.
  • ...and 11 more figures