Mutual Learning for Finetuning Click-Through Rate Prediction Models

Ibrahim Can Yilmaz; Said Aldemir

Mutual Learning for Finetuning Click-Through Rate Prediction Models

Ibrahim Can Yilmaz, Said Aldemir

TL;DR

This work investigates mutual learning as a finetuning strategy for CTR prediction, addressing the practicality gap between complex and simple models. It formalizes a mutual-learning objective where $N$ networks minimize a combination of supervised BCE loss and a mutual $L_{MSE}$ loss across peers, i.e., $L_{\theta_n} = L_{BCE_n} + \frac{1}{N - 1} \sum_{i \neq n} L_{MSE}(p_i, p_n)$ with $L_{MSE}(p_1, p_2) = (p_1 - p_2)^2$. Experiments on Criteo and Avazu with four baseline CTR models (DeepFM, DCN, PNN, FiBiNET) show that mutual learning from scratch does not help, but finetuning pretrained models with mutual learning yields consistent improvements up to $0.66\%$ relative on Avazu and $0.37\%$ on Criteo; gains are larger when accompanying models have higher pretrained scores. The results suggest that mutual learning benefits arise from aggregating diverse, non-model-specific knowledge, with an optimal co-training size around four, and demonstrate robust improvements across datasets and architectures.

Abstract

Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR prediction models do not have much complex architectures, so it's hard to call one of them 'cumbersome' and the other one 'tiny'. On the other hand, the performance gap is also not very large between complex and simple models. So, distilling knowledge from one model to the other could not be worth the effort. Under these considerations, Mutual Learning could be a better approach, since all the models could be improved mutually. In this paper, we showed how useful the mutual learning algorithm could be when it is between equals. In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.

Mutual Learning for Finetuning Click-Through Rate Prediction Models

TL;DR

This work investigates mutual learning as a finetuning strategy for CTR prediction, addressing the practicality gap between complex and simple models. It formalizes a mutual-learning objective where

networks minimize a combination of supervised BCE loss and a mutual

loss across peers, i.e.,

with

. Experiments on Criteo and Avazu with four baseline CTR models (DeepFM, DCN, PNN, FiBiNET) show that mutual learning from scratch does not help, but finetuning pretrained models with mutual learning yields consistent improvements up to

relative on Avazu and

on Criteo; gains are larger when accompanying models have higher pretrained scores. The results suggest that mutual learning benefits arise from aggregating diverse, non-model-specific knowledge, with an optimal co-training size around four, and demonstrate robust improvements across datasets and architectures.

Abstract

Paper Structure (17 sections, 5 equations, 2 figures, 4 tables)

This paper contains 17 sections, 5 equations, 2 figures, 4 tables.

Introduction
Methods
Deep CTR Models
Knowledge Distillation
Deep Mutual Learning
Proposed Algorithm
EXPERIMENTS & RESULTS
Experimental Setup
Datasets
Evaluation Metrics
Test Models
Hyperparameters and Implementation Details
RQ1: Comparison with regular training
RQ2: Mutual Learning as finetuning
RQ3: Same model instances vs. different models
...and 2 more sections

Figures (2)

Figure 1: Architecture of Knowledge Distillation algorithm.
Figure 2: Architecture of proposed algorithm.

Mutual Learning for Finetuning Click-Through Rate Prediction Models

TL;DR

Abstract

Mutual Learning for Finetuning Click-Through Rate Prediction Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)