RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

Song-Li Wu; Liang Du; Jia-Qi Yang; Yu-Ai Wang; De-Chuan Zhan; Shuang Zhao; Zi-Xun Sun

RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

Song-Li Wu, Liang Du, Jia-Qi Yang, Yu-Ai Wang, De-Chuan Zhan, Shuang Zhao, Zi-Xun Sun

TL;DR

RE-SORT tackles spurious correlations in CTR prediction by combining a multilevel stacked recurrent (MSR) structure with a spurious correlation elimination (SCE) module. MSR learns diverse high-order interactions across hierarchical feature spaces via two streams (D-SR and S-SR), while SCE projects features into a high-dimensional Laplacian kernel space and uses random Fourier features to decorrelate nuisance signals from the CTR task, guided by sample reweighting. The model achieves state-of-the-art accuracy and speed on four public CTR datasets and a production dataset, with ablation studies confirming the effectiveness of both MSR and SCE. The approach offers practical benefits for real-world recommender systems by improving generalization and reducing inference time, and the authors provide open-source code and data release to facilitate adoption and further research.

Abstract

Click-through rate (CTR) prediction is a critical task in recommendation systems, serving as the ultimate filtering step to sort items for a user. Most recent cutting-edge methods primarily focus on investigating complex implicit and explicit feature interactions; however, these methods neglect the spurious correlation issue caused by confounding factors, thereby diminishing the model's generalization ability. We propose a CTR prediction framework that REmoves Spurious cORrelations in mulTilevel feature interactions, termed RE-SORT, which has two key components. I. A multilevel stacked recurrent (MSR) structure enables the model to efficiently capture diverse nonlinear interactions from feature spaces at different levels. II. A spurious correlation elimination (SCE) module further leverages Laplacian kernel mapping and sample reweighting methods to eliminate the spurious correlations concealed within the multilevel features, allowing the model to focus on the true causal features. Extensive experiments conducted on four challenging CTR datasets and our production dataset demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and speed. The utilized codes, models and dataset will be released at https://github.com/RE-SORT.

RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

TL;DR

Abstract

Paper Structure (22 sections, 4 theorems, 22 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 4 theorems, 22 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Related works
Recurrent structure
Spurious correlation in recommendation tasks
Method
Multilevel stacked recurrent (MSR) interaction
Spurious correlation elimination
CTR predictor
Loss function
The theory behind sample reweighting
Experiments
Datasets and evaluation metrics
Baseline
Implementation
Comparison with the SOTA methods
...and 7 more sections

Key Result

Theorem 1

Let $\tilde{\beta_{\mathrm{FM}_{\mathrm{rob}}}}$ and $\tilde{\beta_{\mathrm{FM}_{\mathrm{fal}}}}$ be the components of $\tilde{\beta}$ corresponding to $\mathrm{FM}_{\mathrm{rob}}$ and $\mathrm{FM}_{\mathrm{fal}}$ respectively. Under Assumption 3.5.1, we have $Var(\tilde{\beta}_{\mathrm{FM}_{\mathrm

Figures (7)

Figure 1: RE-SORT exploits multilevel features with spurious correlation elimination for CTR prediction.
Figure 2: Overview of the RE-SORT framework, which contains two key components: I. a multilevel stacked recurrent (MSR) structure and II. a spurious correlation elimination (SCE) module. The deep and shallow stacked recurrent interactions (D-SR and S-SR) in the MSR structure and the SCE module are colored blue, green, and purple, respectively. The D-SR and S-SR streams contain M and N (M $>$ N) stacked blocks, and "$Q_i$", "$K_i$", and "$V_i$" denote the query, key, and value, respectively, for $i$-th block. The attenuation coefficient is denoted by $r_i$ for $i$-th block (dotted lines). The outputs $F_d$ and $F_s$ are concatenated ("C") as the input of the SCE and the CTR predictor, which is visualized to show the spurious correlation elimination process guided by the SCE module. "RKHS" denotes the reproducing kernel Hilbert space. The SCE procedure is explained in Algorithm \ref{['RE-SORT']}.
Figure 3: Performance with different MSR depths.
Figure 4: Spurious correlation elimination of RE-SORT.
Figure 5: Variance comparison of different SOTA models.
...and 2 more figures

Theorems & Definitions (6)

Theorem 1
Definition 1
Definition 2
Lemma 1
Theorem 2
Proposition 1

RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

TL;DR

Abstract

RE-SORT: Removing Spurious Correlation in Multilevel Interaction for CTR Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)