Table of Contents
Fetching ...

Adaptive Preference Aggregation

Benjamin Heymann

TL;DR

This work addresses AI alignment beyond RLHF by reframing preference aggregation as a Condorcet-consistent problem and adopting maximal lotteries via an urn-process mechanism. It introduces Adaptive Preference Aggregation (APA), an online neural-urn approach that learns to map user embeddings to a distribution over alternatives, converging toward maximal lotteries through a neural urn with distillation to stabilize oscillations. The framework connects social-choice theory with replicator dynamics and neural function approximation, and demonstrates competitive performance in a toy environment against principled baselines such as local maximal lotteries and Borda. The study highlights the potential of context-adaptive, Condorcet-consistent aggregation for scalable alignment, while acknowledging limitations and outlining directions for scale, convergence guarantees, and integration with recommender systems and large language models.

Abstract

AI alignment, the challenge of ensuring AI systems act in accordance with human values, has emerged as a critical problem in the development of systems such as foundation models and recommender systems. Still, the current dominant approach, reinforcement learning with human feedback (RLHF) faces known theoretical limitations in aggregating diverse human preferences. Social choice theory provides a framework to aggregate preferences, but was not developed for the multidimensional applications typical of AI. Leveraging insights from a recently published urn process, this work introduces a preference aggregation strategy that adapts to the user's context and that inherits the good properties of the maximal lottery, a Condorcet-consistent solution concept.

Adaptive Preference Aggregation

TL;DR

This work addresses AI alignment beyond RLHF by reframing preference aggregation as a Condorcet-consistent problem and adopting maximal lotteries via an urn-process mechanism. It introduces Adaptive Preference Aggregation (APA), an online neural-urn approach that learns to map user embeddings to a distribution over alternatives, converging toward maximal lotteries through a neural urn with distillation to stabilize oscillations. The framework connects social-choice theory with replicator dynamics and neural function approximation, and demonstrates competitive performance in a toy environment against principled baselines such as local maximal lotteries and Borda. The study highlights the potential of context-adaptive, Condorcet-consistent aggregation for scalable alignment, while acknowledging limitations and outlining directions for scale, convergence guarantees, and integration with recommender systems and large language models.

Abstract

AI alignment, the challenge of ensuring AI systems act in accordance with human values, has emerged as a critical problem in the development of systems such as foundation models and recommender systems. Still, the current dominant approach, reinforcement learning with human feedback (RLHF) faces known theoretical limitations in aggregating diverse human preferences. Social choice theory provides a framework to aggregate preferences, but was not developed for the multidimensional applications typical of AI. Leveraging insights from a recently published urn process, this work introduces a preference aggregation strategy that adapts to the user's context and that inherits the good properties of the maximal lottery, a Condorcet-consistent solution concept.

Paper Structure

This paper contains 25 sections, 7 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Graphical representation of the urn process introduced by brandlNaturalAdaptiveProcess2024, for simplicity we omit the mutation rate in this representation. Iteratively, (1) two alternatives are sampled from the urn, then (2) a randomly sampled user express their preference of the two options, (3) the ball of the least prefered option is then replaced in the urn by a ball for the prefered option, which (4) changes the states of the urn.
  • Figure 2: Illustration of the mathematical model of Section \ref{['subsec:mathematical-model']}. $\mathcal{A}$ is embodied by the three action of rock-paper-scissors. The users, in white, prefer alternatives that are closer to them. The criterion from \ref{['eq:criterion']} induces a non-transitive structure similar to the classical zero-sum game rock-paper-scissors. It is notable that such structure is not well captured by reward based approaches, as explained in Section \ref{['sec:scoring-method']}.
  • Figure 3: The algorithm might cycle (top) or not (bottom) depending on the presence of non transitivity in the data. Depending on the downsteam applicartion, this observation might calls for a distillation step at the end.
  • Figure 4: Example of generated data. The squares represent the alternatives, and the points the users. The colors of the points correspond to the atom of the partition induced by the embedding. We represent with a green square the overall Condorcet winner
  • Figure 5: Performance against adaptive maximal lottery during learning (before distillation)