Expert with Clustering: Hierarchical Online Preference Learning Framework

Tianyue Zhou; Jung-Hoon Cho; Babak Rahimi Ardabili; Hamed Tabkhi; Cathy Wu

Expert with Clustering: Hierarchical Online Preference Learning Framework

Tianyue Zhou, Jung-Hoon Cho, Babak Rahimi Ardabili, Hamed Tabkhi, Cathy Wu

TL;DR

This work introduces a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice and incorporates a novel Loss-guided Distance metric, which is instrumental in generating more representative cluster centroids.

Abstract

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users. We introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ options, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. To the best of the authors knowledge, this is the first work to analyze the regret of an integrated expert algorithm with k-Means clustering. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

Expert with Clustering: Hierarchical Online Preference Learning Framework

TL;DR

Abstract

users,

rounds per user, and

options, our algorithm achieves a regret bound of

. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. To the best of the authors knowledge, this is the first work to analyze the regret of an integrated expert algorithm with k-Means clustering. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

Paper Structure (15 sections, 5 theorems, 10 equations, 2 figures, 2 algorithms)

This paper contains 15 sections, 5 theorems, 10 equations, 2 figures, 2 algorithms.

Introduction
Related Works
Contributions
Problem Formulation
Expert with Clustering (EWC)
General Framework
EWC for Online Preference Learning
Clustering with Loss-guided Distance
Regret analysis
Regret Bound of EWC
Comparison
Experiments
Experimental Setup
Results
Conclusion

Key Result

Theorem 4.1

Let $P$ be any distribution of $\boldsymbol{\theta}_{i}\in\mathbbm{R}^d$ with $\boldsymbol{\mu} = \mathbbm{E}_P[\boldsymbol{\theta}_{i}]$, $\sigma^2 =\mathbbm{E}_P[||\boldsymbol{\theta}_{i}-\boldsymbol{\mu}||^2]$, and finite Kurtosis. Let $\{\mathbf{c}_{k}\}_{k\in [K]}$ be any set of centroids, $k^*

Figures (2)

Figure 1: Illustrative figure for Expert with Clustering algorithm.
Figure 2: Comparative regret analysis of online learning algorithms: Expert with Clustering (EWC, Ours) shows lower regret than the baseline algorithms (Follow-the-Leader, LinUCB, Oracle FTL) and approaches the consistency of the Oracle methods.

Theorems & Definitions (9)

Theorem 4.1: Regret Bound of EWC
proof
Corollary 4.1.1
proof
Lemma 4.2.1: Regret Bound of SupLinUCB
Lemma 4.2.2: Regret Bound of Oracle FTL
proof
Corollary 4.2.1: Advantage of EWC
proof

Expert with Clustering: Hierarchical Online Preference Learning Framework

TL;DR

Abstract

Expert with Clustering: Hierarchical Online Preference Learning Framework

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)