Table of Contents
Fetching ...

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Djallel Bouneffouf, Irina Rish

TL;DR

The survey addresses how multi-armed and contextual bandits can be applied across real-world domains and machine learning pipelines, emphasizing a taxonomy-based organization of applications and methods. It consolidates state-of-the-art techniques such as LINUCB, Thompson Sampling, muMAB, and Hyperband, and analyzes domain-specific insights on stationary versus non-stationary settings. Key contributions include a structured mapping of applications to bandit formulations, identification of gaps (e.g., multitask transfer and continual learning), and guidance for practitioners implementing bandit-based solutions. The work highlights the practical impact of bandits in healthcare, finance, pricing, information retrieval, dialogue systems, and ML tooling, while outlining promising directions for future research and cross-domain knowledge transfer.

Abstract

In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

TL;DR

The survey addresses how multi-armed and contextual bandits can be applied across real-world domains and machine learning pipelines, emphasizing a taxonomy-based organization of applications and methods. It consolidates state-of-the-art techniques such as LINUCB, Thompson Sampling, muMAB, and Hyperband, and analyzes domain-specific insights on stationary versus non-stationary settings. Key contributions include a structured mapping of applications to bandit formulations, identification of gaps (e.g., multitask transfer and continual learning), and guidance for practitioners implementing bandit-based solutions. The work highlights the practical impact of bandits in healthcare, finance, pricing, information retrieval, dialogue systems, and ML tooling, while outlining promising directions for future research and cross-domain knowledge transfer.

Abstract

In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.

Paper Structure

This paper contains 21 sections, 2 tables.