Open Benchmarking for Click-Through Rate Prediction
Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, Xiuqiang He
TL;DR
This paper addresses the lack of standardized benchmarking for click-through rate prediction by introducing the open BARS-CTR benchmark. It conducts a large-scale, reproducible evaluation of 24 models across two real-world datasets (Criteo and Avazu), totaling over 7,000 experiments and more than 12,000 GPU hours, with a fully documented evaluation protocol and open source toolkit (FuxiCTR). The findings show that with thorough hyper-parameter tuning, performance differences among many modern CTR models shrink, and some prior results were biased by inconsistent data splits and preprocessing. By releasing code, settings, and results, the work aims to promote fair comparisons, reproducible research, and easier adoption for practitioners and newcomers.
Abstract
Click-through rate (CTR) prediction is a critical task for many applications, as its accuracy has a direct impact on user experience and platform revenue. In recent years, CTR prediction has been widely studied in both academia and industry, resulting in a wide variety of CTR prediction models. Unfortunately, there is still a lack of standardized benchmarks and uniform evaluation protocols for CTR prediction research. This leads to non-reproducible or even inconsistent experimental results among existing studies, which largely limits the practical value and potential impact of their research. In this work, we build an open benchmark for CTR prediction, namely BARS-CTR, and present a rigorous comparison of different models in a reproducible manner. To this end, we ran over 7,000 experiments for more than 12,000 GPU hours in total to re-evaluate 24 existing models on multiple datasets and settings. Surprisingly, our experiments show that with sufficient hyper-parameter search and model tuning, many deep models have smaller differences than expected. The results also reveal that making real progress on the modeling of CTR prediction is indeed a very challenging research task. We believe that our benchmarking work could not only allow researchers to gauge the effectiveness of new models conveniently but also make them fairly compare with the state of the arts. We have publicly released the benchmarking code, evaluation protocols, and hyper-parameter settings of our work to promote reproducible research in this field.
