Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
Linfeng Cao, Ming Shi, Ness B. Shroff
TL;DR
This work introduces Preference-Aware MO-MAB (PAMO-MAB), where each user carries a D-dimensional preference vector and learning aims to maximize cumulative inner-product rewards within the Pareto front rather than achieving blanket Pareto optimality. The authors design a two-component framework—preference estimation and preference-aware optimization—addressing unknown and hidden user preferences. They propose two algorithms: PRUCB-HP for the hidden-preference setting, featuring a weighted least-squares preference estimator and a dual-exploration policy with reward and preference bonuses; and PRUCB-UP for the provided-preference case, with a simplified estimator and optimization that achieves near-optimal regret. Theoretical results establish sublinear regret bounds under both scenarios, and extensive numerical analyses show strong empirical gains over traditional MO-MAB baselines, validating effective online preference learning and customized optimization within the Pareto front. Overall, the work offers provable performance guarantees for personalized MO-MAB and demonstrates practical effectiveness for preference-centric customization in multi-objective decision problems.
Abstract
Multi-objective multi-armed bandit (MO-MAB) problems traditionally aim to achieve Pareto optimality. However, real-world scenarios often involve users with varying preferences across objectives, resulting in a Pareto-optimal arm that may score high for one user but perform quite poorly for another. This highlights the need for customized learning, a factor often overlooked in prior research. To address this, we study a preference-aware MO-MAB framework in the presence of explicit user preference. It shifts the focus from achieving Pareto optimality to further optimizing within the Pareto front under preference-centric customization. To our knowledge, this is the first theoretical study of customized MO-MAB optimization with explicit user preferences. Motivated by practical applications, we explore two scenarios: unknown preference and hidden preference, each presenting unique challenges for algorithm design and analysis. At the core of our algorithms are preference estimation and preference-aware optimization mechanisms to adapt to user preferences effectively. We further develop novel analytical techniques to establish near-optimal regret of the proposed algorithms. Strong empirical performance confirm the effectiveness of our approach.
