Table of Contents
Fetching ...

Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models

Yewen Fan, Nian Si, Xiangchen Song, Kun Zhang

TL;DR

This work addresses the problem that evaluating deep CTR prediction pipelines is hampered by high variance from randomness in training. It introduces Calibrated Loss Metric, notably Calibrated Log Loss, which calibrates the bias term on a holdout set to reduce variance while preserving mean performance. The authors provide theoretical guarantees under a linear regression setting and demonstrate through extensive experiments on synthetic data and the Avazu CTR dataset that Calibrated metrics improve the accuracy of model-pipeline comparisons and reduce variance. This variance-reduced evaluation framework enables more reliable benchmarking and can enhance AutoML-driven pipeline selection in practice.

Abstract

The adoption of deep learning across various fields has been extensive, yet there is a lack of focus on evaluating the performance of deep learning pipelines. Typically, with the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. This practice can lead to imprecise comparisons due to the variance in neural network evaluation metrics, which stems from the inherent randomness in the training process. Traditional solutions, such as running the training process multiple times, are often infeasible due to computational constraints. In this paper, we introduce a novel metric framework, the Calibrated Loss Metric, designed to address this issue by reducing the variance present in its conventional counterpart. Consequently, this new metric enhances the accuracy in detecting effective modeling improvements. Our approach is substantiated by theoretical justifications and extensive experimental validations within the context of Deep Click-Through Rate Prediction Models.

Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models

TL;DR

This work addresses the problem that evaluating deep CTR prediction pipelines is hampered by high variance from randomness in training. It introduces Calibrated Loss Metric, notably Calibrated Log Loss, which calibrates the bias term on a holdout set to reduce variance while preserving mean performance. The authors provide theoretical guarantees under a linear regression setting and demonstrate through extensive experiments on synthetic data and the Avazu CTR dataset that Calibrated metrics improve the accuracy of model-pipeline comparisons and reduce variance. This variance-reduced evaluation framework enables more reliable benchmarking and can enhance AutoML-driven pipeline selection in practice.

Abstract

The adoption of deep learning across various fields has been extensive, yet there is a lack of focus on evaluating the performance of deep learning pipelines. Typically, with the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. This practice can lead to imprecise comparisons due to the variance in neural network evaluation metrics, which stems from the inherent randomness in the training process. Traditional solutions, such as running the training process multiple times, are often infeasible due to computational constraints. In this paper, we introduce a novel metric framework, the Calibrated Loss Metric, designed to address this issue by reducing the variance present in its conventional counterpart. Consequently, this new metric enhances the accuracy in detecting effective modeling improvements. Our approach is substantiated by theoretical justifications and extensive experimental validations within the context of Deep Click-Through Rate Prediction Models.
Paper Structure (16 sections, 5 theorems, 35 equations, 1 figure, 12 tables, 1 algorithm)

This paper contains 16 sections, 5 theorems, 35 equations, 1 figure, 12 tables, 1 algorithm.

Key Result

Theorem 4

Suppose that the features $X\in \mathbb{R}^d$ and the label $Y$ are distributed jointly Gaussian. We consider linear regression $h(x) = \beta ^\top x +\alpha$. Let $\hat{\beta}_n$ be the coefficient learned from the training data with sample size $n$. Then, we have where the expectation is taken over the randomness over both the training and test samples.

Figures (1)

  • Figure 1: Batch Normalization Experiment

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 4
  • Corollary 5
  • Lemma 6
  • Theorem 7
  • Corollary 8