Adaptive Preference Aggregation
Benjamin Heymann
TL;DR
This work addresses AI alignment beyond RLHF by reframing preference aggregation as a Condorcet-consistent problem and adopting maximal lotteries via an urn-process mechanism. It introduces Adaptive Preference Aggregation (APA), an online neural-urn approach that learns to map user embeddings to a distribution over alternatives, converging toward maximal lotteries through a neural urn with distillation to stabilize oscillations. The framework connects social-choice theory with replicator dynamics and neural function approximation, and demonstrates competitive performance in a toy environment against principled baselines such as local maximal lotteries and Borda. The study highlights the potential of context-adaptive, Condorcet-consistent aggregation for scalable alignment, while acknowledging limitations and outlining directions for scale, convergence guarantees, and integration with recommender systems and large language models.
Abstract
AI alignment, the challenge of ensuring AI systems act in accordance with human values, has emerged as a critical problem in the development of systems such as foundation models and recommender systems. Still, the current dominant approach, reinforcement learning with human feedback (RLHF) faces known theoretical limitations in aggregating diverse human preferences. Social choice theory provides a framework to aggregate preferences, but was not developed for the multidimensional applications typical of AI. Leveraging insights from a recently published urn process, this work introduces a preference aggregation strategy that adapts to the user's context and that inherits the good properties of the maximal lottery, a Condorcet-consistent solution concept.
