Table of Contents
Fetching ...

Efficient Personalization of Generative Models via Optimal Experimental Design

Guy Schacht, Ziyad Sheebaelhamd, Riccardo De Santi, Mojmír Mutný, Andreas Krause

TL;DR

Preference-based personalization of generative models is often bottlenecked by costly human feedback. This work introduces ED-PBRL, an Optimal Experimental Design-based framework that selects a small set of informative preference queries by solving a convex optimization over state visitation measures, with a theta-agnostic surrogate and a per-timestep Fisher Information decomposition. The approach provides a mean-squared-error bound for the regularized MLE, proves convergence of the Frank-Wolfe optimization, and demonstrates strong data efficiency in both synthetic GT-model and LLM-simulated-preference experiments for text-to-image personalization. Overall, ED-PBRL offers principled, scalable query design that accelerates learning of user-specific reward models, enabling practical personalization with fewer human labels while acknowledging potential biases and ethical considerations in adaptive querying.

Abstract

Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, from which we can elucidate the latent reward function modeling user preferences efficiently. We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model. We show that this problem has a convex optimization formulation, and introduce a statistically and computationally efficient algorithm ED-PBRL that is supported by theoretical guarantees and can efficiently construct structured queries such as images or text. We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.

Efficient Personalization of Generative Models via Optimal Experimental Design

TL;DR

Preference-based personalization of generative models is often bottlenecked by costly human feedback. This work introduces ED-PBRL, an Optimal Experimental Design-based framework that selects a small set of informative preference queries by solving a convex optimization over state visitation measures, with a theta-agnostic surrogate and a per-timestep Fisher Information decomposition. The approach provides a mean-squared-error bound for the regularized MLE, proves convergence of the Frank-Wolfe optimization, and demonstrates strong data efficiency in both synthetic GT-model and LLM-simulated-preference experiments for text-to-image personalization. Overall, ED-PBRL offers principled, scalable query design that accelerates learning of user-specific reward models, enabling practical personalization with fewer human labels while acknowledging potential biases and ethical considerations in adaptive querying.

Abstract

Preference learning from human feedback has the ability to align generative models with the needs of end-users. Human feedback is costly and time-consuming to obtain, which creates demand for data-efficient query selection methods. This work presents a novel approach that leverages optimal experimental design to ask humans the most informative preference queries, from which we can elucidate the latent reward function modeling user preferences efficiently. We formulate the problem of preference query selection as the one that maximizes the information about the underlying latent preference model. We show that this problem has a convex optimization formulation, and introduce a statistically and computationally efficient algorithm ED-PBRL that is supported by theoretical guarantees and can efficiently construct structured queries such as images or text. We empirically present the proposed framework by personalizing a text-to-image generative model to user-specific styles, showing that it requires less preference queries compared to random query selection.

Paper Structure

This paper contains 67 sections, 11 theorems, 87 equations, 13 figures, 2 tables, 2 algorithms.

Key Result

theorem 4.0

Under mild conditions (Appendix sec:appendix_mse_derivation), the expected square error of $\hat{\theta}_\lambda$, of multinomial likelihood, satisfies where $C^{\lambda}_{\theta^*} = (1-r^{\lambda}_{\theta^*})^{-4}$ depends on a local consistency radius $r^{\lambda}_{\theta^*}\!\in[0,1)$.

Figures (13)

  • Figure 1: The personalization workflow: ED-PBRL calculates policies $\pi_1,\ldots,\pi_K$ which select prompts in the combinatorial token space. These prompts are embedded (CLIP) and rendered with Stable Diffusion 1.4 rombach2022high; preferences on the resulting images are collected and used to estimate the guidance model $\hat{\theta}$. Each prompt is formed by a sequence of tokens---a trajectory of a policy $\pi_i$. The $K$ policies are chosen so that, for a given budget, the preferences and embeddings yield an accurate estimate of the guidance model. Notice that each policy can be parameterized via a large table or as a separate generative language model.
  • Figure 2: Performance of ED-PBRL on the Sunny synthetic Ground Truth (GT) model. We plot the Cosine Error (left) and Preference Prediction Error (right) against the number of interaction episodes. These results demonstrate the efficiency of our OED approach. Numerical results for all GT models (Sunny, Medieval, and Technological) are presented in Appendix (Figure \ref{['fig:appendix_numerical_all_gt_models']}).
  • Figure 3: Sunny GT qualitative personalization and experimental context.
  • Figure 4: LLM-simulated preference held-out accuracy. Panel \ref{['fig:llm_accuracy_curve']} shows accuracy as we vary the number of training episodes at $\lambda=100$, with shaded standard errors across 10 styles. ED-PBRL maintains ${\sim}60\%$ accuracy across all training sizes, while random exploration degrades significantly with less data. Panel \ref{['fig:llm_accuracy_bar']} summarizes results at 50 training episodes. The random-guess baseline is $1/K=25\%$.
  • Figure 5: Performance of ED-PBRL on Sunny, Medieval, and Technological synthetic Ground Truth (GT) models. For each GT model, we plot the Cosine Error (left column) and Preference Prediction Error (right column) against the number of interaction episodes. Results are averaged over N=25 independent runs, and the shaded regions represent the standard error of the mean. The Sunny GT model results are also shown in the main paper (Figure \ref{['fig:synthetic_results_all']}).
  • ...and 8 more figures

Theorems & Definitions (18)

  • theorem 4.0: Maximizing FIM improves Estimation
  • theorem 4.1: Truncated trajectory
  • theorem 5.0
  • theorem B.1: Upper Bound on MSE
  • proof
  • lemma B.1
  • proof : Proof of Lemma \ref{['lemma:expectation-state-reformulation']}
  • theorem B.1
  • proof : Proof of Theorem \ref{['theorem:approx-fim-vector-measure']}
  • lemma B.1
  • ...and 8 more