Information maximization for a broad variety of multi-armed bandit games

Alex Barbier-Chebbah; Christian L. Vestergaard; Jean-Baptiste Masson

Information maximization for a broad variety of multi-armed bandit games

Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson

TL;DR

This work extends the information-maximization paradigm to three structured multi-armed bandit problems—Explore-$m$, linear, and many-armed bandits—addressing over-exploration through problem-tailored observables and tractable approximations. It introduces PacAIM for identifying an epsilon-optimal top-$m$ subset via a separator $ heta_b$ and a stopping rule based on $P_{ m top}P_{ m bot}$, LinAIM which adapts AIM to linear payoffs by weighting information gains with suboptimality likelihood, and ARM, a finite-horizon strategy that minimizes upcoming regret by balancing exploration of new arms against exploitation of the current best. The methods rely on entropy-based gains, Gaussian/posterior approximations, and extreme-value analyses to yield tractable, implementable policies with competitive empirical performance against standard baselines. The results indicate robust gains across Gaussian and Bernoulli reward settings and provide a unified framework to extend information-based decision-making to broader structured bandit problems, with future work aimed at theoretical performance guarantees and further extensions to non-Gaussian and heavier-tailed rewards.

Abstract

Information and free-energy maximization are physics principles that provide general rules for an agent to optimize actions in line with specific goals and policies. These principles are the building blocks for designing decision-making policies capable of efficient performance with only partial information. Notably, the information maximization principle has shown remarkable success in the classical bandit problem and has recently been shown to yield optimal algorithms for Gaussian and sub-Gaussian reward distributions. This article explores a broad extension of physics-based approaches to more complex and structured bandit problems. To this end, we cover three distinct types of bandit problems, where information maximization is adapted and leads to strong performance. Since the main challenge of information maximization lies in avoiding over-exploration, we highlight how information is tailored at various levels to mitigate this issue, paving the way for more efficient and robust decision-making strategies.

Information maximization for a broad variety of multi-armed bandit games

TL;DR

This work extends the information-maximization paradigm to three structured multi-armed bandit problems—Explore-

, linear, and many-armed bandits—addressing over-exploration through problem-tailored observables and tractable approximations. It introduces PacAIM for identifying an epsilon-optimal top-

subset via a separator

and a stopping rule based on

, LinAIM which adapts AIM to linear payoffs by weighting information gains with suboptimality likelihood, and ARM, a finite-horizon strategy that minimizes upcoming regret by balancing exploration of new arms against exploitation of the current best. The methods rely on entropy-based gains, Gaussian/posterior approximations, and extreme-value analyses to yield tractable, implementable policies with competitive empirical performance against standard baselines. The results indicate robust gains across Gaussian and Bernoulli reward settings and provide a unified framework to extend information-based decision-making to broader structured bandit problems, with future work aimed at theoretical performance guarantees and further extensions to non-Gaussian and heavier-tailed rewards.

Information maximization for a broad variety of multi-armed bandit games

TL;DR

Abstract

Information maximization for a broad variety of multi-armed bandit games

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)