Table of Contents
Fetching ...

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

Mohamed A. Radwan, Himaghna Bhattacharjee, Quinn Lanners, Jiasheng Zhang, Serkan Karakulak, Houssam Nassif, Murat Ali Bayir

TL;DR

This work tackles offline evaluation of large-scale ads ranking under complex system interactions and selection bias, where standard $IPS$-based methods struggle. It introduces a domain-adapted reward model that sits atop an offline A/B testing simulation, enabling cross-domain lift estimation through a per-domain weighting scheme $w^k_a = p_{T_k}(a\mid x)/p_S(a\mid x)$. The training objective enforces cross-domain consistency via a multi-domain loss that balances non-overlapping region emphasis and inter-domain weight alignment, aiming to minimize recovery error across target domains. Empirical results on synthetic and real online data show improved Recovery $CV$ over baselines and $IPS$, with a 17.6% gain in Recovery $CV$ in a CTR-related online experiment, highlighting the approach’s practical impact for robust offline evaluation of ads ranking changes.

Abstract

We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.

Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation

TL;DR

This work tackles offline evaluation of large-scale ads ranking under complex system interactions and selection bias, where standard -based methods struggle. It introduces a domain-adapted reward model that sits atop an offline A/B testing simulation, enabling cross-domain lift estimation through a per-domain weighting scheme . The training objective enforces cross-domain consistency via a multi-domain loss that balances non-overlapping region emphasis and inter-domain weight alignment, aiming to minimize recovery error across target domains. Empirical results on synthetic and real online data show improved Recovery over baselines and , with a 17.6% gain in Recovery in a CTR-related online experiment, highlighting the approach’s practical impact for robust offline evaluation of ads ranking changes.

Abstract

We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.
Paper Structure (12 sections, 15 equations, 2 figures)

This paper contains 12 sections, 15 equations, 2 figures.

Figures (2)

  • Figure 1: Counterfactual evaluation setup
  • Figure 2: $Rec_{CV}$ of each method used with synthetic data