Table of Contents
Fetching ...

Adaptive Budget Optimization for Multichannel Advertising Using Combinatorial Bandits

Briti Gangopadhyay, Zhao Wang, Alberto Silvio Chiappa, Shingo Takamatsu

TL;DR

This work tackles adaptive budget allocation for multichannel digital advertising under non-stationary market dynamics. It introduces a public, long-horizon simulation environment with logged data and a saturating-mean GP-based combinatorial bandit algorithm (TUCB-MAE) that incorporates change-point detection and a targeted exploration strategy to adapt quickly to regime shifts. Theoretical analysis yields a sublinear regret bound of $\tilde{O}(\sqrt{TN \sum_{j=1}^{N} \gamma_T(\hat{n}_j)})$, and extensive empirical evaluation across real campaigns and a public dataset demonstrates higher rewards, lower regret, and lower CPC compared to state-of-the-art baselines. Ablation studies validate the contributions of the saturating mean, efficiency-driven exploration, and change-point detection, underscoring practical gains for long-running, non-stationary advertising campaigns. The work also promotes reproducibility by releasing the simulation environment and datasets publicly.

Abstract

Effective budget allocation is crucial for optimizing the performance of digital advertising campaigns. However, the development of practical budget allocation algorithms remain limited, primarily due to the lack of public datasets and comprehensive simulation environments capable of verifying the intricacies of real-world advertising. While multi-armed bandit (MAB) algorithms have been extensively studied, their efficacy diminishes in non-stationary environments where quick adaptation to changing market dynamics is essential. In this paper, we advance the field of budget allocation in digital advertising by introducing three key contributions. First, we develop a simulation environment designed to mimic multichannel advertising campaigns over extended time horizons, incorporating logged real-world data. Second, we propose an enhanced combinatorial bandit budget allocation strategy that leverages a saturating mean function and a targeted exploration mechanism with change-point detection. This approach dynamically adapts to changing market conditions, improving allocation efficiency by filtering target regions based on domain knowledge. Finally, we present both theoretical analysis and empirical results, demonstrating that our method consistently outperforms baseline strategies, achieving higher rewards and lower regret across multiple real-world campaigns.

Adaptive Budget Optimization for Multichannel Advertising Using Combinatorial Bandits

TL;DR

This work tackles adaptive budget allocation for multichannel digital advertising under non-stationary market dynamics. It introduces a public, long-horizon simulation environment with logged data and a saturating-mean GP-based combinatorial bandit algorithm (TUCB-MAE) that incorporates change-point detection and a targeted exploration strategy to adapt quickly to regime shifts. Theoretical analysis yields a sublinear regret bound of , and extensive empirical evaluation across real campaigns and a public dataset demonstrates higher rewards, lower regret, and lower CPC compared to state-of-the-art baselines. Ablation studies validate the contributions of the saturating mean, efficiency-driven exploration, and change-point detection, underscoring practical gains for long-running, non-stationary advertising campaigns. The work also promotes reproducibility by releasing the simulation environment and datasets publicly.

Abstract

Effective budget allocation is crucial for optimizing the performance of digital advertising campaigns. However, the development of practical budget allocation algorithms remain limited, primarily due to the lack of public datasets and comprehensive simulation environments capable of verifying the intricacies of real-world advertising. While multi-armed bandit (MAB) algorithms have been extensively studied, their efficacy diminishes in non-stationary environments where quick adaptation to changing market dynamics is essential. In this paper, we advance the field of budget allocation in digital advertising by introducing three key contributions. First, we develop a simulation environment designed to mimic multichannel advertising campaigns over extended time horizons, incorporating logged real-world data. Second, we propose an enhanced combinatorial bandit budget allocation strategy that leverages a saturating mean function and a targeted exploration mechanism with change-point detection. This approach dynamically adapts to changing market conditions, improving allocation efficiency by filtering target regions based on domain knowledge. Finally, we present both theoretical analysis and empirical results, demonstrating that our method consistently outperforms baseline strategies, achieving higher rewards and lower regret across multiple real-world campaigns.

Paper Structure

This paper contains 18 sections, 4 theorems, 63 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

lemma 1

Given the realization of a GP $f(\cdot)$, the estimates of the mean $\hat{\mu}_{t-1}(b)$ and variance $\hat{\sigma}^2_{t-1}(b)$ for the input $b$ belonging to the input space $B$, for each $\beta \in \mathbb{R}^+$ the following condition holds: for each $b \in B$.

Figures (7)

  • Figure 1: Budget allocation across multiple sub campaigns in digital advertisement
  • Figure 2: a) Architecture of the simulation environment where the reward function learned from the logged data b) Variability of budget to cost consumption in the environment c) Changing reward functions over different months in the environment
  • Figure 3: A simple representation of the GP estimation with saturated mean and targeted UCB exploration
  • Figure 4: Comparison with respect to the human operator's budget allocation from the logged dataset
  • Figure 5: Reward comparison for around 300 days for attendance management campaign.
  • ...and 2 more figures

Theorems & Definitions (4)

  • lemma 1: From 10.5555/3104322.3104451
  • proposition 1
  • lemma 2: From 10.5555/3104322.3104451
  • proposition 2