Table of Contents
Fetching ...

FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu

TL;DR

This paper designs a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens, and proposes to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks.

Abstract

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.

FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

TL;DR

This paper designs a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens, and proposes to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks.

Abstract

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.
Paper Structure (35 sections, 15 equations, 6 figures, 5 tables)

This paper contains 35 sections, 15 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Three cross-modal pretraining tasks. Task (a) provides coarse-grained instance-level alignment via contrastive learning, while tasks (b) and (c) achieve fine-grained feature-level alignment through jointly masked modality modeling.
  • Figure 2: The overall framework of our proposed FLIP.
  • Figure 3: The hyperparameter study on textual mask ratio $r_{text}$ (left column) and tabular mask ratio $r_{tab}$ (right column) on MovieLens-1M (top) and BookCrossing (bottom) datasets.
  • Figure 4: The hyperparameter study on the temperature $\tau$.
  • Figure 5: Visualization of similarities between the sample representations of masked textual and tabular data. "Text-$f$" and "Tab-$f$" denote that we mask the $f$-th field of the input data of textual or tabular modalities, respectively.
  • ...and 1 more figures