Counterfactual Evaluation of Ads Ranking Models through Domain Adaptation
Mohamed A. Radwan, Himaghna Bhattacharjee, Quinn Lanners, Jiasheng Zhang, Serkan Karakulak, Houssam Nassif, Murat Ali Bayir
TL;DR
This work tackles offline evaluation of large-scale ads ranking under complex system interactions and selection bias, where standard $IPS$-based methods struggle. It introduces a domain-adapted reward model that sits atop an offline A/B testing simulation, enabling cross-domain lift estimation through a per-domain weighting scheme $w^k_a = p_{T_k}(a\mid x)/p_S(a\mid x)$. The training objective enforces cross-domain consistency via a multi-domain loss that balances non-overlapping region emphasis and inter-domain weight alignment, aiming to minimize recovery error across target domains. Empirical results on synthetic and real online data show improved Recovery $CV$ over baselines and $IPS$, with a 17.6% gain in Recovery $CV$ in a CTR-related online experiment, highlighting the approach’s practical impact for robust offline evaluation of ads ranking changes.
Abstract
We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.
