Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles
Jung-hun Kim, Se-Young Yun, Minchan Jeong, Jun Hyun Nam, Jinwoo Shin, Richard Combes
TL;DR
Contextual linear bandits with noisy and missing features pose a mismatch between observed data and latent rewards that defeats standard realizability-based guarantees. The authors define a Bayesian oracle that selects actions based on the conditional distribution of the latent features given observed data, and show this oracle can differ substantially from the naive argmax of observed scores. To address this, they propose Contextual Linear Bandits on Bayesian Features (CLBBF), which estimates Bayesian features from data and uses OFUL-style optimism, achieving a regret of $R(T)=\tilde{O}\left(d\sqrt{T}+\frac{d^2}{p^{3/2}}\sqrt{\frac{T}{K}}+\frac{d}{p^4K}\right)$, reducing to $\tilde{O}(d\sqrt{T})$ when $K$ is large. Empirical results on synthetic and real-world data validate robustness to missing data and demonstrate practical relevance for privacy-preserving or noisy-feature settings.
Abstract
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. To address the challenges posed by this noise, we analyze Bayesian oracles given the observed noisy features. Our Bayesian analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. These deviations are highly non-intuitive and do not occur in classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims to approximate the Bayesian oracle based on the observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.
