Adaptive Budget Optimization for Multichannel Advertising Using Combinatorial Bandits
Briti Gangopadhyay, Zhao Wang, Alberto Silvio Chiappa, Shingo Takamatsu
TL;DR
This work tackles adaptive budget allocation for multichannel digital advertising under non-stationary market dynamics. It introduces a public, long-horizon simulation environment with logged data and a saturating-mean GP-based combinatorial bandit algorithm (TUCB-MAE) that incorporates change-point detection and a targeted exploration strategy to adapt quickly to regime shifts. Theoretical analysis yields a sublinear regret bound of $\tilde{O}(\sqrt{TN \sum_{j=1}^{N} \gamma_T(\hat{n}_j)})$, and extensive empirical evaluation across real campaigns and a public dataset demonstrates higher rewards, lower regret, and lower CPC compared to state-of-the-art baselines. Ablation studies validate the contributions of the saturating mean, efficiency-driven exploration, and change-point detection, underscoring practical gains for long-running, non-stationary advertising campaigns. The work also promotes reproducibility by releasing the simulation environment and datasets publicly.
Abstract
Effective budget allocation is crucial for optimizing the performance of digital advertising campaigns. However, the development of practical budget allocation algorithms remain limited, primarily due to the lack of public datasets and comprehensive simulation environments capable of verifying the intricacies of real-world advertising. While multi-armed bandit (MAB) algorithms have been extensively studied, their efficacy diminishes in non-stationary environments where quick adaptation to changing market dynamics is essential. In this paper, we advance the field of budget allocation in digital advertising by introducing three key contributions. First, we develop a simulation environment designed to mimic multichannel advertising campaigns over extended time horizons, incorporating logged real-world data. Second, we propose an enhanced combinatorial bandit budget allocation strategy that leverages a saturating mean function and a targeted exploration mechanism with change-point detection. This approach dynamically adapts to changing market conditions, improving allocation efficiency by filtering target regions based on domain knowledge. Finally, we present both theoretical analysis and empirical results, demonstrating that our method consistently outperforms baseline strategies, achieving higher rewards and lower regret across multiple real-world campaigns.
