Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Tian Huang, Shengbo Wang, Ke Li
TL;DR
The paper tackles interactive multi-objective optimization by learning user preferences directly from human feedback, bypassing explicit fitness models. It introduces D-PBEMO, a two-module framework with a clustering-based stochastic dueling bandit consultation module and a density-ratio based preference-elicitation module that yields a probabilistic guide $\widetilde{\Pr}(\mathbf{x})$ to steer MOEAs such as NSGA-II and MOEA/D. A clustering-based regret bound of $\mathcal{O}(K^2 \log T)$ is established for the consultation component, complemented by a KL-divergence termination criterion to manage DM effort. Empirical results across 33 synthetic benchmarks plus RNA inverse design and PSP demonstrate competitive performance relative to PBEMO state-of-the-art, with notable improvements as the number of objectives grows and the DM workload remains manageable.
Abstract
Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally, we deploy our method in a practical problem, specifically in protein structure prediction (PSP). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
