Table of Contents
Fetching ...

Towards Unifying Feature Interaction Models for Click-Through Rate Prediction

Yu Kang, Junwei Pan, Jipeng Jin, Shudong Huang, Xiaofeng Gao, Lei Xiao

TL;DR

A novel model is introduced that achieves competitive results compared to state-of-the-art CTR models and gets significant GMV lift during online A/B test in Tencent's advertising platform.

Abstract

Modeling feature interactions plays a crucial role in accurately predicting click-through rates (CTR) in advertising systems. To capture the intricate patterns of interaction, many existing models employ matrix-factorization techniques to represent features as lower-dimensional embedding vectors, enabling the modeling of interactions as products between these embeddings. In this paper, we propose a general framework called IPA to systematically unify these models. Our framework comprises three key components: the Interaction Function, which facilitates feature interaction; the Layer Pooling, which constructs higher-level interaction layers; and the Layer Aggregator, which combines the outputs of all layers to serve as input for the subsequent classifier. We demonstrate that most existing models can be categorized within our framework by making specific choices for these three components. Through extensive experiments and a dimensional collapse analysis, we evaluate the performance of these choices. Furthermore, by leveraging the most powerful components within our framework, we introduce a novel model that achieves competitive results compared to state-of-the-art CTR models. PFL gets significant GMV lift during online A/B test in Tencent's advertising platform and has been deployed as the production model in several primary scenarios.

Towards Unifying Feature Interaction Models for Click-Through Rate Prediction

TL;DR

A novel model is introduced that achieves competitive results compared to state-of-the-art CTR models and gets significant GMV lift during online A/B test in Tencent's advertising platform.

Abstract

Modeling feature interactions plays a crucial role in accurately predicting click-through rates (CTR) in advertising systems. To capture the intricate patterns of interaction, many existing models employ matrix-factorization techniques to represent features as lower-dimensional embedding vectors, enabling the modeling of interactions as products between these embeddings. In this paper, we propose a general framework called IPA to systematically unify these models. Our framework comprises three key components: the Interaction Function, which facilitates feature interaction; the Layer Pooling, which constructs higher-level interaction layers; and the Layer Aggregator, which combines the outputs of all layers to serve as input for the subsequent classifier. We demonstrate that most existing models can be categorized within our framework by making specific choices for these three components. Through extensive experiments and a dimensional collapse analysis, we evaluate the performance of these choices. Furthermore, by leveraging the most powerful components within our framework, we introduce a novel model that achieves competitive results compared to state-of-the-art CTR models. PFL gets significant GMV lift during online A/B test in Tencent's advertising platform and has been deployed as the production model in several primary scenarios.

Paper Structure

This paper contains 31 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of the IPA framework and the common choices of its three components.
  • Figure 2: Performance of various choices within each component in the IPA framework.
  • Figure 3: Comparison of embedding collapse for 2-order interaction models on the Criteo-x1 dataset. Each of singular value sum and information abundance is plotted for fields aligned in two ways. While fields in (a) and (c) are ordered by field cardinality, those in (b) and (d) are ordered by average feature pair importance in FwFM model. It is clear that FmFM experiences less collapse than FwFM and FM on both high-order fields and important fields, only different in its matrix-projecting Interaction Function.
  • Figure 4: Field-wise singular value spectrum for 2-order interaction models on the Criteo-x1 dataset. The singular values of representative fields(high-order fields in (a) and high-importance fields in (b)) are ordered and displayed. It is obvious that singular values of FmFM are at the largest level as well as being the most equally distributed among all 2-order models.
  • Figure 5: Trends of $\alpha_l$, $\alpha_l * \Vert\bm{W}_{l-1}\Vert_\text{F}$ and model performances in the training process. Our model learns low $\alpha_l$ and $\Vert\bm{W}_{l-1}\Vert_\text{F}$ for extra layers (5-10), obtaining high-level and robust performance even when over-estimating data order.
  • ...and 1 more figures