How to Capture Human Preference: Commissioning of a Robotic Use-Case via Preferential Bayesian Optimisation
Sander De Witte, Jeroen Taets, Andras Retzler, Guillaume Crevecoeur, Tom Lefebvre
TL;DR
The paper tackles capturing expert preferences for industrial commissioning when scalar objectives are hard to define, by applying Preferential Bayesian Optimization (PBO) that relies on pairwise human duels. It benchmarks state-of-the-art PBO methods on a planar pushing robotic task, compares against an expert-designed cost, and demonstrates a data-driven GP cost learned from preferences that can drive standard BO. Key findings show that preference-based costs can better reflect expert judgments and enable effective automated tuning, while confirming limitations and guiding future work on richer preferences and adaptive exploration-exploitation strategies. This work provides a practical pathway to automate expert decision-making in hardware commissioning using observational preferences rather than explicit scalar objectives.
Abstract
The popularity of Bayesian Optimization (BO) to automate or support the commissioning of engineering systems is rising. Conventional BO, however, relies on the availability of a scalar objective function. The latter is often difficult to define and rarely captures the nuanced judgement of expert operators in industrial settings. Preferential Bayesian Optimization (PBO) addresses this limitation by relying solely on pairwise preference feedback of a human expert, so-called duels. In this paper, we study PBO's capacity to commission a particular setup where a manipulator needs to push a block towards a target position. We benchmark state-of-the-art algorithms in both simulations and in the real world. Our results confirm that PBO can commission the set-up to the satisfaction of an expert operator whilst relying solely on binary preference feedback. To evaluate to what extend the same result can be achieved using conventional BO we investigate the experts decision consistency against an expert-designed cost function. Our study reveals that the experts fail to define a cost function that is in full agreement with their own decision process as witnessed in the PBO experiments. We then show that the auxiliary cost function that is constructed as a by-product of the PBO algorithms outperforms the expert-designed cost function in terms of decision consistency. Furthermore we demonstrate that this cost function can be used with conventional BO algorithms in an effort to reproduce the optimal design. This proofs the preference based cost function captures the experts' preferences perhaps more effectively than the experts could articulate preference themselves. In conclusion, we discuss downsides and propose directions for future research.
