Table of Contents
Fetching ...

Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding

Shenghong He, Chao Yu

TL;DR

We address cross-channel RTB with a shared daily budget by introducing HMMCB, a hierarchical multi-agent meta-reinforcement learning framework. The top level employs a CPC-constrained diffusion model to dynamically allocate budgets across channels, while the bottom level uses a state-action decoupled actor-critic with context-based meta-channel knowledge learning and centralized value guidance to coordinate bidding decisions. Empirical results on Meituan data show state-of-the-art ROI with reduced CPC and robust online gains, and ablations confirm the value of diffusion budgeting, SA decoupling, CMCK, and centralized training. This approach enables scalable, budget-aware cross-channel bidding in real-world RTB systems and offers a practical blueprint for integrating hierarchical RL with multi-channel dynamics.

Abstract

Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems. Advertisers employ strategic bidding to optimize their advertising impact while adhering to various financial constraints, such as the return-on-investment (ROI) and cost-per-click (CPC). Primarily focusing on bidding with fixed budget constraints, traditional approaches cannot effectively manage the dynamic budget allocation problem where the goal is to achieve global optimization of bidding performance across multiple channels with a shared budget. In this paper, we propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization. In this framework, the top-level strategy applies a CPC constrained diffusion model to dynamically allocate budgets among the channels according to their distinct features and complex interdependencies, while the bottom-level strategy adopts a state-action decoupled actor-critic method to address the problem of extrapolation errors in offline learning caused by out-of-distribution actions and a context-based meta-channel knowledge learning method to improve the state representation capability of the policy based on the shared knowledge among different channels. Comprehensive experiments conducted on a large scale real-world industrial dataset from the Meituan ad bidding platform demonstrate that our method achieves a state-of-the-art performance.

Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding

TL;DR

We address cross-channel RTB with a shared daily budget by introducing HMMCB, a hierarchical multi-agent meta-reinforcement learning framework. The top level employs a CPC-constrained diffusion model to dynamically allocate budgets across channels, while the bottom level uses a state-action decoupled actor-critic with context-based meta-channel knowledge learning and centralized value guidance to coordinate bidding decisions. Empirical results on Meituan data show state-of-the-art ROI with reduced CPC and robust online gains, and ablations confirm the value of diffusion budgeting, SA decoupling, CMCK, and centralized training. This approach enables scalable, budget-aware cross-channel bidding in real-world RTB systems and offers a practical blueprint for integrating hierarchical RL with multi-channel dynamics.

Abstract

Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems. Advertisers employ strategic bidding to optimize their advertising impact while adhering to various financial constraints, such as the return-on-investment (ROI) and cost-per-click (CPC). Primarily focusing on bidding with fixed budget constraints, traditional approaches cannot effectively manage the dynamic budget allocation problem where the goal is to achieve global optimization of bidding performance across multiple channels with a shared budget. In this paper, we propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization. In this framework, the top-level strategy applies a CPC constrained diffusion model to dynamically allocate budgets among the channels according to their distinct features and complex interdependencies, while the bottom-level strategy adopts a state-action decoupled actor-critic method to address the problem of extrapolation errors in offline learning caused by out-of-distribution actions and a context-based meta-channel knowledge learning method to improve the state representation capability of the policy based on the shared knowledge among different channels. Comprehensive experiments conducted on a large scale real-world industrial dataset from the Meituan ad bidding platform demonstrate that our method achieves a state-of-the-art performance.

Paper Structure

This paper contains 24 sections, 21 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: An overview of the Meituan advertising system. Advertising platforms need to bid on different channels based on the budget and constraints of merchants. CTR and CVR represent average click-through rate and conversion rate respectively.
  • Figure 2: The HMMCB framework.
  • Figure 3: Offline convergence process of the six methods. The vertical axis of each subplot represents different evaluation metrics, while the horizontal axis denotes the number of training steps. Each method is executed with 10 randomly selected seeds.
  • Figure 4: Results of online experiments with seven methods
  • Figure 5: Comparison results of HMMBC with existing methods. The experimental results are based on running the experiment 5 times with a random seed, and the y-axis represents the ROI. SA denotes the state-action decoupled actor-critic method.
  • ...and 1 more figures