Table of Contents
Fetching ...

Learning the MPC objective function from human preferences

Pablo Krupa, Hasna El Hasnaouy, Mario Zanon, Alberto Bemporad

TL;DR

The paper addresses the challenge of tuning MPC objectives when explicit performance metrics are unavailable by learning an objective from human trajectory preferences. It formulates a surrogate preference function via binary classification, embedding it into an MPC framework through a scalar sigma whose minimization yields preferred trajectories. Two numerical studies on a three-mass oscillator demonstrate that the learned objectives can closely reproduce human preferences, with accuracy improving as training data increases, including both a quadratic-cost and a more complex, time-to-settle-based preference. The approach offers a data-driven, interpretable route to MPC tuning that leverages pairwise human feedback to shape closed-loop behavior. Stability guarantees are not universal, but the method remains practically valuable for aligning MPC with expert preferences.

Abstract

In Model Predictive Control (MPC), the objective function plays a central role in determining the closed-loop behavior of the system, and must therefore be designed to achieve the desired closed-loop performance. However, in real-world scenarios, its design is often challenging, as it requires balancing complex trade-offs and accurately capturing a performance criterion that may not be easily quantifiable in terms of an objective function. This paper explores preference-based learning as a data-driven approach to constructing an objective function from human preferences over trajectory pairs. We formulate the learning problem as a machine learning classification task to learn a surrogate model that estimates the likelihood of a trajectory being preferred over another. The approach provides a surrogate model that can directly be used as an MPC objective function. Numerical results show that we can learn objective functions that provide closed-loop trajectories that align with the expressed human preferences.

Learning the MPC objective function from human preferences

TL;DR

The paper addresses the challenge of tuning MPC objectives when explicit performance metrics are unavailable by learning an objective from human trajectory preferences. It formulates a surrogate preference function via binary classification, embedding it into an MPC framework through a scalar sigma whose minimization yields preferred trajectories. Two numerical studies on a three-mass oscillator demonstrate that the learned objectives can closely reproduce human preferences, with accuracy improving as training data increases, including both a quadratic-cost and a more complex, time-to-settle-based preference. The approach offers a data-driven, interpretable route to MPC tuning that leverages pairwise human feedback to shape closed-loop behavior. Stability guarantees are not universal, but the method remains practically valuable for aligning MPC with expert preferences.

Abstract

In Model Predictive Control (MPC), the objective function plays a central role in determining the closed-loop behavior of the system, and must therefore be designed to achieve the desired closed-loop performance. However, in real-world scenarios, its design is often challenging, as it requires balancing complex trade-offs and accurately capturing a performance criterion that may not be easily quantifiable in terms of an objective function. This paper explores preference-based learning as a data-driven approach to constructing an objective function from human preferences over trajectory pairs. We formulate the learning problem as a machine learning classification task to learn a surrogate model that estimates the likelihood of a trajectory being preferred over another. The approach provides a surrogate model that can directly be used as an MPC objective function. Numerical results show that we can learn objective functions that provide closed-loop trajectories that align with the expressed human preferences.

Paper Structure

This paper contains 9 sections, 21 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Results for quadratic-based preference function $\Pi_\phi$ and quadratic $\sigma$.
  • Figure 2: Results for preference function $\Pi_{\kappa_{\varepsilon}}$ and quadratic $\sigma$.

Theorems & Definitions (3)

  • Remark 1
  • Remark 2
  • Remark 3