Human-in-the-loop: Real-time Preference Optimization
Wenbin Wang, Wenjie Xu, Colin N. Jones
TL;DR
This work tackles real-time, human-in-the-loop optimization where user preferences are provided as pairwise comparisons. It introduces an online feedback optimization controller that uses random exploration and binary preference feedback to estimate a descent direction for the latent utility $\tilde{\Phi}(u)=\Phi(h(u),u)$, and proves closed-loop stability and convergence to the unique optimizer $u^*$ under mild assumptions. Theoretical results are backed by Lyapunov-based stability bounds and convergence guarantees, with error terms that vanish as $\delta\to 0$ and $\mu\to 0$. Numerical experiments on a simple quadratic plant and a thermal comfort problem based on PMV/PPD demonstrate practical effectiveness and reveal the impact of plant transients and feedback noise on convergence.
Abstract
Optimization with preference feedback is an active research area with many applications in engineering systems where humans play a central role, such as building control and autonomous vehicles. While most existing studies focus on optimizing a static user utility, few have investigated its closed-loop behavior that accounts for system transients. In this work, we propose an online feedback optimization controller that can optimize user utility using pairwise comparison feedback with both optimality and closed-loop stability guarantees. By adding a random exploration signal, the controller estimates the gradient based on the binary utility comparison feedback between two consecutive time steps. We analyze its closed-loop behavior when interacting with a nonlinear plant and show that, under mild assumptions, the controller converges to the optimal point without inducing instability. Theoretical findings are further validated through numerical experiments.
