On the Pros and Cons of Active Learning for Moral Preference Elicitation
Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong
TL;DR
This work questions the effectiveness of active learning for eliciting moral preferences by formalizing preferences with a utility function $u(x)$ and observing responses via $R(x,x')$, then evaluating two active-learning approaches (version-space and Bayesian BALD) against random querying under moral-specific challenges. Using simulations that introduce preference instability, model misspecification, and noise, the study shows that active learning can outperform random querying in small, stable, and well-specified regimes, but can underperform when instability or misspecification are pronounced or when feature dimensionality is high. The results stress the need for context-aware deployment and robust, adaptable elicitation strategies for moral judgments, especially in high-stakes domains like organ allocation or autonomous systems. The findings motivate cautious use of active learning and call for human-subject studies to quantify how often the moral-preference assumptions are violated in practice and how to design more resilient elicitation methods.
Abstract
Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.
