SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang, Wenxuan Ding, Shangbin Feng, Greg Durrett, Yulia Tsvetkov
TL;DR
SPARTA ALIGNMENT proposes a collaborative, combat-based framework to align multiple large language models by pitting pairs of models against each other while using the rest as peer judges. The judge signals are aggregated via a reputation-weighted scheme and converted into preference data learned through Direct Preference Optimization, enabling iterative self-improvement without external reward models. Empirical results across 12 tasks show Sparta outperforms strong baselines on 10 tasks with an average gain of ~7%, with particularly strong gains in reasoning, instruction following, and cross-domain generalization, driven by model diversity and larger pools. The approach demonstrates robust generalization to unseen tasks and reveals emergent hierarchical stratification among models, highlighting the value of collective competition in driving diverse and reliable outputs for real-world deployment.
Abstract
We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in evaluation, multiple LLMs form a "sparta tribe" to compete against each other in fulfilling instructions while serving as judges for the competition of others. For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system, where winners/losers of combat gain/lose weight in evaluating others. The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration. SPARTA ALIGNMENT enables the self-evolution of multiple LLMs in an iterative and collective competition process. Extensive experiments demonstrate that SPARTA ALIGNMENT outperforms initial models and 4 self-alignment baselines across 10 out of 12 tasks and datasets with 7.0% average improvement. Further analysis reveals that SPARTA ALIGNMENT generalizes more effectively to unseen tasks and leverages the expertise diversity of participating models to produce more logical, direct and informative outputs.
