Table of Contents
Fetching ...

From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models

Mingjia Yin, Junwei Pan, Hao Wang, Ximei Wang, Shangyu Zhang, Jie Jiang, Defu Lian, Enhong Chen

TL;DR

This work reframes CTR prediction from a discriminative feature-interaction paradigm to a generative framework called Supervised Feature Generation (SFG). SFG employs an Encoder to build per-feature hidden representations from all features and a Decoder to generate all feature embeddings, trained with a supervised loss on clicks rather than self-supervised signals, enabling an All-Predict-All mechanism. Across multiple baselines and large-scale datasets, SFG consistently reduces embedding dimensional collapse and information redundancy, yielding measurable offline gains in AUC and LogLoss and significant online lift in GMV and CTR in production. The approach is shown to be broadly compatible with existing CTR models, narrows architectural performance gaps, and demonstrates practical impact, including deployment in Tencent’s advertising platform. Overall, the paper provides a flexible, generalizable paradigm that improves representation quality and CTR performance through supervised feature generation and decorrelated embeddings.

Abstract

Click-Through Rate (CTR) prediction, a core task in recommendation systems, aims to estimate the probability of users clicking on items. Existing models predominantly follow a discriminative paradigm, which relies heavily on explicit interactions between raw ID embeddings. However, this paradigm inherently renders them susceptible to two critical issues: embedding dimensional collapse and information redundancy, stemming from the over-reliance on feature interactions \emph{over raw ID embeddings}. To address these limitations, we propose a novel \emph{Supervised Feature Generation (SFG)} framework, \emph{shifting the paradigm from discriminative ``feature interaction" to generative ``feature generation"}. Specifically, SFG comprises two key components: an \emph{Encoder} that constructs hidden embeddings for each feature, and a \emph{Decoder} tasked with regenerating the feature embeddings of all features from these hidden representations. Unlike existing generative approaches that adopt self-supervised losses, we introduce a supervised loss to utilize the supervised signal, \ie, click or not, in the CTR prediction task. This framework exhibits strong generalizability: it can be seamlessly integrated with most existing CTR models, reformulating them under the generative paradigm. Extensive experiments demonstrate that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains across various datasets and base models. The code is available at https://github.com/USTC-StarTeam/GE4Rec.

From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models

TL;DR

This work reframes CTR prediction from a discriminative feature-interaction paradigm to a generative framework called Supervised Feature Generation (SFG). SFG employs an Encoder to build per-feature hidden representations from all features and a Decoder to generate all feature embeddings, trained with a supervised loss on clicks rather than self-supervised signals, enabling an All-Predict-All mechanism. Across multiple baselines and large-scale datasets, SFG consistently reduces embedding dimensional collapse and information redundancy, yielding measurable offline gains in AUC and LogLoss and significant online lift in GMV and CTR in production. The approach is shown to be broadly compatible with existing CTR models, narrows architectural performance gaps, and demonstrates practical impact, including deployment in Tencent’s advertising platform. Overall, the paper provides a flexible, generalizable paradigm that improves representation quality and CTR performance through supervised feature generation and decorrelated embeddings.

Abstract

Click-Through Rate (CTR) prediction, a core task in recommendation systems, aims to estimate the probability of users clicking on items. Existing models predominantly follow a discriminative paradigm, which relies heavily on explicit interactions between raw ID embeddings. However, this paradigm inherently renders them susceptible to two critical issues: embedding dimensional collapse and information redundancy, stemming from the over-reliance on feature interactions \emph{over raw ID embeddings}. To address these limitations, we propose a novel \emph{Supervised Feature Generation (SFG)} framework, \emph{shifting the paradigm from discriminative ``feature interaction" to generative ``feature generation"}. Specifically, SFG comprises two key components: an \emph{Encoder} that constructs hidden embeddings for each feature, and a \emph{Decoder} tasked with regenerating the feature embeddings of all features from these hidden representations. Unlike existing generative approaches that adopt self-supervised losses, we introduce a supervised loss to utilize the supervised signal, \ie, click or not, in the CTR prediction task. This framework exhibits strong generalizability: it can be seamlessly integrated with most existing CTR models, reformulating them under the generative paradigm. Extensive experiments demonstrate that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains across various datasets and base models. The code is available at https://github.com/USTC-StarTeam/GE4Rec.

Paper Structure

This paper contains 54 sections, 12 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Different generative paradigms. Specifically, this framework employs an (optional) encoder to process source data of a particular form and generate an output embedding. Subsequently, a decoder utilizes this embedding to generate the target, representing data of a different form. Finally, a loss function will be used to evaluate the generation quality. Lacking an explicit data structure, our feature generation paradigm adopts an "all predict all" paradigm to predict each feature with all features. Notably, we employ a supervised generative loss function, optimizing the cross-entropy loss regarding the sample-wise label $y_{\text{sup}}$, rather than the self-supervised loss.
  • Figure 2: The feature generation framework builds an encoder based on all features as the $x_{\text{source}}$, generates an output embedding, and utilizes it to predict all features simultaneously as the $x_{\text{target}}$. For multi-layer generation, generated representations in each layer will serve as the $x_{\text{source}}$ and $x_{\text{target}}$ in the next layer generation. Specifically, the encoder is implemented as a field-wise single-layer non-linear MLP, while the decoder is implemented with feature interaction functions in previous CTR models.
  • Figure 3: Normalized singular value spectrum of embeddings used to interact with raw ID embeddings. It is the concatenation of raw ID embeddings for the discriminative paradigm, while the embedding immediately constructed by the encoder for the generative paradigm.
  • Figure 4: Pearson correlation matrix between two interacted embeddings. (a)$\rightarrow$(b)$\rightarrow$(c) means more complex models, which also exhibits a trend of redundancy reduction. This reveals the importance of information redundancy reduction when designing CTR models. In (d), we can find the generative DCN V2 almost produces a zero correlation matrix, perfectly aligning with the redundancy reduction principle.
  • Figure 5: Ablation study on the feature generation framework design using DCN V2 on Avazu.
  • ...and 9 more figures