Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models
Yewen Fan, Nian Si, Xiangchen Song, Kun Zhang
TL;DR
This work addresses the problem that evaluating deep CTR prediction pipelines is hampered by high variance from randomness in training. It introduces Calibrated Loss Metric, notably Calibrated Log Loss, which calibrates the bias term on a holdout set to reduce variance while preserving mean performance. The authors provide theoretical guarantees under a linear regression setting and demonstrate through extensive experiments on synthetic data and the Avazu CTR dataset that Calibrated metrics improve the accuracy of model-pipeline comparisons and reduce variance. This variance-reduced evaluation framework enables more reliable benchmarking and can enhance AutoML-driven pipeline selection in practice.
Abstract
The adoption of deep learning across various fields has been extensive, yet there is a lack of focus on evaluating the performance of deep learning pipelines. Typically, with the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. This practice can lead to imprecise comparisons due to the variance in neural network evaluation metrics, which stems from the inherent randomness in the training process. Traditional solutions, such as running the training process multiple times, are often infeasible due to computational constraints. In this paper, we introduce a novel metric framework, the Calibrated Loss Metric, designed to address this issue by reducing the variance present in its conventional counterpart. Consequently, this new metric enhances the accuracy in detecting effective modeling improvements. Our approach is substantiated by theoretical justifications and extensive experimental validations within the context of Deep Click-Through Rate Prediction Models.
