Table of Contents
Fetching ...

Optimizing Feature Set for Click-Through Rate Prediction

Fuyuan Lyu, Xing Tang, Dugang Liu, Liang Chen, Xiuqiang He, Xue Liu

TL;DR

This work addresses the problem of optimizing CTR prediction by selecting an optimal feature set that accounts for both individual features and their interactions. It introduces OptFS, a gate-based, feature-level selection framework that decomposes interaction selection into the product of two feature gates and trains end-to-end via a learning-by-continuation scheme with a two-stage process: searching and retraining. Empirical results on three public datasets show that OptFS improves AUC while substantially reducing feature usage and computation, outperforming field-level baselines and several interaction-selection methods. The findings demonstrate the practical impact of jointly optimizing features and interactions for scalable, accurate CTR models, with insights from transferability and case studies. Overall, OptFS advances feature-set optimization by enabling fine-grained, end-to-end pruning of features and interactions in large-scale CTR systems.

Abstract

Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of feature and its interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set only containing features which improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost.

Optimizing Feature Set for Click-Through Rate Prediction

TL;DR

This work addresses the problem of optimizing CTR prediction by selecting an optimal feature set that accounts for both individual features and their interactions. It introduces OptFS, a gate-based, feature-level selection framework that decomposes interaction selection into the product of two feature gates and trains end-to-end via a learning-by-continuation scheme with a two-stage process: searching and retraining. Empirical results on three public datasets show that OptFS improves AUC while substantially reducing feature usage and computation, outperforming field-level baselines and several interaction-selection methods. The findings demonstrate the practical impact of jointly optimizing features and interactions for scalable, accurate CTR models, with insights from transferability and case studies. Overall, OptFS advances feature-set optimization by enabling fine-grained, end-to-end pruning of features and interactions in large-scale CTR systems.

Abstract

Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of feature and its interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set only containing features which improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost.
Paper Structure (27 sections, 17 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 17 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the general CTR framework.
  • Figure 2: The Overview of OptFS.
  • Figure 3: Visualization of gating vector $g$ during searching and retraining stages.
  • Figure 4: Inference Time on Criteo and Avazu Dataset. The Y-axis represents the influence time, measured by ms
  • Figure 5: Visualization of efficiency-effectiveness trade-off on Criteo datasets. The closer to the top-left the better.
  • ...and 1 more figures