Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

Mohamed A. Radwan; Himaghna Bhattacharjee; Quinn Lanners; Jiasheng Zhang; Serkan Karakulak; Houssam Nassif; Murat Ali Bayir

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

Mohamed A. Radwan, Himaghna Bhattacharjee, Quinn Lanners, Jiasheng Zhang, Serkan Karakulak, Houssam Nassif, Murat Ali Bayir

TL;DR

This work tackles offline evaluation of large-scale ads ranking under complex system interactions and selection bias, where standard $IPS$-based methods struggle. It introduces a domain-adapted reward model that sits atop an offline A/B testing simulation, enabling cross-domain lift estimation through a per-domain weighting scheme $w^k_a = p_{T_k}(a\mid x)/p_S(a\mid x)$. The training objective enforces cross-domain consistency via a multi-domain loss that balances non-overlapping region emphasis and inter-domain weight alignment, aiming to minimize recovery error across target domains. Empirical results on synthetic and real online data show improved Recovery $CV$ over baselines and $IPS$, with a 17.6% gain in Recovery $CV$ in a CTR-related online experiment, highlighting the approach’s practical impact for robust offline evaluation of ads ranking changes.

Abstract

We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

TL;DR

This work tackles offline evaluation of large-scale ads ranking under complex system interactions and selection bias, where standard

-based methods struggle. It introduces a domain-adapted reward model that sits atop an offline A/B testing simulation, enabling cross-domain lift estimation through a per-domain weighting scheme

. The training objective enforces cross-domain consistency via a multi-domain loss that balances non-overlapping region emphasis and inter-domain weight alignment, aiming to minimize recovery error across target domains. Empirical results on synthetic and real online data show improved Recovery

over baselines and

, with a 17.6% gain in Recovery

in a CTR-related online experiment, highlighting the approach’s practical impact for robust offline evaluation of ads ranking changes.

Abstract

Paper Structure (12 sections, 15 equations, 2 figures)

This paper contains 12 sections, 15 equations, 2 figures.

Introduction
Methodology
Overview
Notation and set-up
Metric
Estimating Lifts Between Domains
Reward Model Training
Experimental Results
Metric Details
Derivation of Recovery Loss
Single-Domain Recovery Optimization
Multi-Domain Optimization

Figures (2)

Figure 1: Counterfactual evaluation setup
Figure 2: $Rec_{CV}$ of each method used with synthetic data

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

TL;DR

Abstract

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)