Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Tao Liu; Qi Xu; Wei Shi; Zhigang Hua; Shuang Yang

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Tao Liu, Qi Xu, Wei Shi, Zhigang Hua, Shuang Yang

TL;DR

An offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline is developed.

Abstract

Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session by dynamically balancing user experience quality and ad monetization. Traditional causal learning-based approaches struggle with key technical challenges, especially in handling confounding bias and distribution shifts. In this paper, we develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline. Moreover, to improve the framework's robustness against unanticipated distribution shifts, we further enhance our framework with a novel offline robust dueling DQN approach. This approach achieves more stable rewards on multiple OpenAI-Gym datasets as perturbations increase, and provides an additional 5% offline gains on real-world ad delivery data. Deployed across multiple production systems, our approach has achieved outsized topline gains. Post-launch online A/B tests have shown double-digit improvements in the engagement-ad score trade-off efficiency, significantly enhancing our platform's capability to serve both consumers and advertisers.

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

TL;DR

Abstract

Paper Structure (24 sections, 3 theorems, 13 equations, 6 figures, 4 tables)

This paper contains 24 sections, 3 theorems, 13 equations, 6 figures, 4 tables.

Introduction
Preliminaries and Related Works
Offline Reinforcement Learning
Robust Reinforcement Learning
Area Under Cost Curve Metric
Ad Allocation
Problem Formulation and Methods
Problem Formulation
Methodology
Linear Function Approximation.
General Function Approximation.
Experimental Results
Data Analysis and Metrics
Production Data from the Same Distribution
Data with Distribution Shifts
...and 9 more sections

Key Result

Proposition 1

For the IPM uncertainty set with $\mathcal{F}$ in (eqn:func-class), we have $\inf_{P \in \mathcal{P}_{s, a}} P^T V_w = (P_{s, a}^0)^T V_w - \delta \|w_{2:d}\|$.

Figures (6)

Figure 1: Structure of the session-level dynamic ad load optimization system. A novel offline robust reinforcement learning approach is applied in the prediction module, which generates state-action values as inputs to the decision module (see Fig. \ref{['fig:robust-ddqn']} for the detailed structure).
Figure 2: Structure of robust dueling DQN. Robustness is incorporated through the empirical robust Bellman operator (Equations (\ref{['eqn:emp-bell']}) and (\ref{['eqn:general-emp-bell']})).
Figure 3: Data analysis on confounding bias (X|T-shifts) and time-wise user behavior shift (Y|X-shifts)
Figure 4: Cumulative rewards of robust dueling DQN and dueling DQN under perturbation
Figure 5: Test AUCC of offline DQN and T-learner on session-level production data.
...and 1 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
Theorem 1

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

TL;DR

Abstract

Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)