Table of Contents
Fetching ...

Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype

Nikola Tankovic, Robert Sajina

TL;DR

Contextual Multi-Armed Bandits (CMABs) enable context-aware sequential decision-making but face trade-offs between scalability and interpretability in dynamic retail settings. The paper presents a scalable, interpretable offer-selection prototype that operates on category-level contexts using Member Purchase Gap (MPG), Matrix Factorization (MF) signals, and logistic regression learned via SGD, with Beta-based exploration and explicit weight trajectories exposed to large language models (LLMs) for explanations. It surveys key CMAB families (UCB, Epsilon-Greedy, Posterior Sampling) and GLMs, situating the prototype among established paradigms like LinUCB and Thompson Sampling, and demonstrates a practical reference implementation that emphasizes interpretability without resorting to full neural-bandit complexity. The work contributes a controllable baseline for understanding bandit behavior at scale, offers a path to transparent, data-efficient personalized offer optimization, and outlines concrete steps toward production-ready deployment with richer representations and non-stationarity handling.

Abstract

This paper presents a concise review of Contextual Multi-Armed Bandit (CMAB) methods and introduces an experimental framework for scalable, interpretable offer selection, addressing the challenge of fast-changing offers. The approach models context at the product category level, allowing offers to span multiple categories and enabling knowledge transfer across similar offers. This improves learning efficiency and generalization in dynamic environments. The framework extends standard CMAB methodology to support multi-category contexts, and achieves scalability through efficient feature engineering and modular design. Advanced features such as MPG (Member Purchase Gap) and MF (Matrix Factorization) capture nuanced user-offer interactions, with implementation in Python for practical deployment. A key contribution is interpretability at scale: logistic regression models yield transparent weight vectors, accessible via a large language model (LLM) interface for real-time, user-level tracking and explanation of evolving preferences. This enables the generation of detailed member profiles and identification of behavioral patterns, supporting personalized offer optimization and enhancing trust in automated decisions. By situating our prototype alongside established paradigms like Generalized Linear Models and Thompson Sampling, we demonstrate its value for both research and real-world CMAB applications.

Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype

TL;DR

Contextual Multi-Armed Bandits (CMABs) enable context-aware sequential decision-making but face trade-offs between scalability and interpretability in dynamic retail settings. The paper presents a scalable, interpretable offer-selection prototype that operates on category-level contexts using Member Purchase Gap (MPG), Matrix Factorization (MF) signals, and logistic regression learned via SGD, with Beta-based exploration and explicit weight trajectories exposed to large language models (LLMs) for explanations. It surveys key CMAB families (UCB, Epsilon-Greedy, Posterior Sampling) and GLMs, situating the prototype among established paradigms like LinUCB and Thompson Sampling, and demonstrates a practical reference implementation that emphasizes interpretability without resorting to full neural-bandit complexity. The work contributes a controllable baseline for understanding bandit behavior at scale, offers a path to transparent, data-efficient personalized offer optimization, and outlines concrete steps toward production-ready deployment with richer representations and non-stationarity handling.

Abstract

This paper presents a concise review of Contextual Multi-Armed Bandit (CMAB) methods and introduces an experimental framework for scalable, interpretable offer selection, addressing the challenge of fast-changing offers. The approach models context at the product category level, allowing offers to span multiple categories and enabling knowledge transfer across similar offers. This improves learning efficiency and generalization in dynamic environments. The framework extends standard CMAB methodology to support multi-category contexts, and achieves scalability through efficient feature engineering and modular design. Advanced features such as MPG (Member Purchase Gap) and MF (Matrix Factorization) capture nuanced user-offer interactions, with implementation in Python for practical deployment. A key contribution is interpretability at scale: logistic regression models yield transparent weight vectors, accessible via a large language model (LLM) interface for real-time, user-level tracking and explanation of evolving preferences. This enables the generation of detailed member profiles and identification of behavioral patterns, supporting personalized offer optimization and enhancing trust in automated decisions. By situating our prototype alongside established paradigms like Generalized Linear Models and Thompson Sampling, we demonstrate its value for both research and real-world CMAB applications.

Paper Structure

This paper contains 11 sections, 5 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Predicted clip probabilities for a single category present in all offers over time, showing both clip (red) and non-clip (blue) events, with purchase dates marked by green vertical lines.
  • Figure 2: Model weight trajectories over time for an example user-category pair, showing how different features' influence evolves during the simulation.
  • Figure : Implementation and Evaluation Procedure for proposed CAMB System