Learning the MPC objective function from human preferences
Pablo Krupa, Hasna El Hasnaouy, Mario Zanon, Alberto Bemporad
TL;DR
The paper addresses the challenge of tuning MPC objectives when explicit performance metrics are unavailable by learning an objective from human trajectory preferences. It formulates a surrogate preference function via binary classification, embedding it into an MPC framework through a scalar sigma whose minimization yields preferred trajectories. Two numerical studies on a three-mass oscillator demonstrate that the learned objectives can closely reproduce human preferences, with accuracy improving as training data increases, including both a quadratic-cost and a more complex, time-to-settle-based preference. The approach offers a data-driven, interpretable route to MPC tuning that leverages pairwise human feedback to shape closed-loop behavior. Stability guarantees are not universal, but the method remains practically valuable for aligning MPC with expert preferences.
Abstract
In Model Predictive Control (MPC), the objective function plays a central role in determining the closed-loop behavior of the system, and must therefore be designed to achieve the desired closed-loop performance. However, in real-world scenarios, its design is often challenging, as it requires balancing complex trade-offs and accurately capturing a performance criterion that may not be easily quantifiable in terms of an objective function. This paper explores preference-based learning as a data-driven approach to constructing an objective function from human preferences over trajectory pairs. We formulate the learning problem as a machine learning classification task to learn a surrogate model that estimates the likelihood of a trajectory being preferred over another. The approach provides a surrogate model that can directly be used as an MPC objective function. Numerical results show that we can learn objective functions that provide closed-loop trajectories that align with the expressed human preferences.
