Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding
Shenghong He, Chao Yu
TL;DR
We address cross-channel RTB with a shared daily budget by introducing HMMCB, a hierarchical multi-agent meta-reinforcement learning framework. The top level employs a CPC-constrained diffusion model to dynamically allocate budgets across channels, while the bottom level uses a state-action decoupled actor-critic with context-based meta-channel knowledge learning and centralized value guidance to coordinate bidding decisions. Empirical results on Meituan data show state-of-the-art ROI with reduced CPC and robust online gains, and ablations confirm the value of diffusion budgeting, SA decoupling, CMCK, and centralized training. This approach enables scalable, budget-aware cross-channel bidding in real-world RTB systems and offers a practical blueprint for integrating hierarchical RL with multi-channel dynamics.
Abstract
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems. Advertisers employ strategic bidding to optimize their advertising impact while adhering to various financial constraints, such as the return-on-investment (ROI) and cost-per-click (CPC). Primarily focusing on bidding with fixed budget constraints, traditional approaches cannot effectively manage the dynamic budget allocation problem where the goal is to achieve global optimization of bidding performance across multiple channels with a shared budget. In this paper, we propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization. In this framework, the top-level strategy applies a CPC constrained diffusion model to dynamically allocate budgets among the channels according to their distinct features and complex interdependencies, while the bottom-level strategy adopts a state-action decoupled actor-critic method to address the problem of extrapolation errors in offline learning caused by out-of-distribution actions and a context-based meta-channel knowledge learning method to improve the state representation capability of the policy based on the shared knowledge among different channels. Comprehensive experiments conducted on a large scale real-world industrial dataset from the Meituan ad bidding platform demonstrate that our method achieves a state-of-the-art performance.
