Efficient Personalization of Generative Models via Optimal Experimental Design
Guy Schacht, Ziyad Sheebaelhamd, Riccardo De Santi, Mojmír Mutný, Andreas Krause
TL;DR
Preference-based personalization of generative models is often bottlenecked by costly human feedback. This work introduces ED-PBRL, an Optimal Experimental Design-based framework that selects a small set of informative preference queries by solving a convex optimization over state visitation measures, with a theta-agnostic surrogate and a per-timestep Fisher Information decomposition. The approach provides a mean-squared-error bound for the regularized MLE, proves convergence of the Frank-Wolfe optimization, and demonstrates strong data efficiency in both synthetic GT-model and LLM-simulated-preference experiments for text-to-image personalization. Overall, ED-PBRL offers principled, scalable query design that accelerates learning of user-specific reward models, enabling practical personalization with fewer human labels while acknowledging potential biases and ethical considerations in adaptive querying.
Abstract
Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, from which we can elucidate the latent reward function modeling user preferences efficiently. We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model. We show that this problem has a convex optimization formulation, and introduce a statistically and computationally efficient algorithm ED-PBRL that is supported by theoretical guarantees and can efficiently construct structured queries such as images or text. We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.
