Table of Contents
Fetching ...

FairFS: Addressing Deep Feature Selection Biases for Recommender System

Xianquan Wang, Zhaocheng Du, Jieming Zhu, Qinglin Jia, Zhenhua Dong, Kai Zhang

TL;DR

This work tackles biases in deep feature selection for recommender systems by identifying layer bias, baseline bias, and approximation bias in gate-based and sensitivity-based methods. It introduces FairFS, which combines aggregated gradient-based feature importance with a smoothing baseline and an aggregated-approximation strategy to produce unbiased, sparse feature selections. The approach yields state-of-the-art performance on three public datasets and demonstrates real-world impact through an online A/B test showing improvements in ECPM and latency. The results suggest FairFS as a practical tool for reducing unnecessary features in industrial recommender pipelines without sacrificing accuracy.

Abstract

Large-scale online marketplaces and recommender systems serve as critical technological support for e-commerce development. In industrial recommender systems, features play vital roles as they carry information for downstream models. Accurate feature importance estimation is critical because it helps identify the most useful feature subsets from thousands of feature candidates for online services. Such selection enables improved online performance while reducing computational cost. To address feature selection problems in deep learning, trainable gate-based and sensitivity-based methods have been proposed and proven effective in industrial practice. However, through the analysis of real-world cases, we identified three bias issues that cause feature importance estimation to rely on partial model layers, samples, or gradients, ultimately leading to inaccurate importance estimation. We refer to these as layer bias, baseline bias, and approximation bias. To mitigate these issues, we propose FairFS, a fair and accurate feature selection algorithm. FairFS regularizes feature importance estimated across all nonlinear transformation layers to address layer bias. It also introduces a smooth baseline feature close to the classifier decision boundary and adopts an aggregated approximation method to alleviate baseline and approximation biases. Extensive experiments demonstrate that FairFS effectively mitigates these biases and achieves state-of-the-art feature selection performance.

FairFS: Addressing Deep Feature Selection Biases for Recommender System

TL;DR

This work tackles biases in deep feature selection for recommender systems by identifying layer bias, baseline bias, and approximation bias in gate-based and sensitivity-based methods. It introduces FairFS, which combines aggregated gradient-based feature importance with a smoothing baseline and an aggregated-approximation strategy to produce unbiased, sparse feature selections. The approach yields state-of-the-art performance on three public datasets and demonstrates real-world impact through an online A/B test showing improvements in ECPM and latency. The results suggest FairFS as a practical tool for reducing unnecessary features in industrial recommender pipelines without sacrificing accuracy.

Abstract

Large-scale online marketplaces and recommender systems serve as critical technological support for e-commerce development. In industrial recommender systems, features play vital roles as they carry information for downstream models. Accurate feature importance estimation is critical because it helps identify the most useful feature subsets from thousands of feature candidates for online services. Such selection enables improved online performance while reducing computational cost. To address feature selection problems in deep learning, trainable gate-based and sensitivity-based methods have been proposed and proven effective in industrial practice. However, through the analysis of real-world cases, we identified three bias issues that cause feature importance estimation to rely on partial model layers, samples, or gradients, ultimately leading to inaccurate importance estimation. We refer to these as layer bias, baseline bias, and approximation bias. To mitigate these issues, we propose FairFS, a fair and accurate feature selection algorithm. FairFS regularizes feature importance estimated across all nonlinear transformation layers to address layer bias. It also introduces a smooth baseline feature close to the classifier decision boundary and adopts an aggregated approximation method to alleviate baseline and approximation biases. Extensive experiments demonstrate that FairFS effectively mitigates these biases and achieves state-of-the-art feature selection performance.
Paper Structure (33 sections, 17 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 33 sections, 17 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: Baseline Bias: In imbalanced binary classification, both mean and zero baselines are biased—zero skews toward positive samples, and mean toward negative ones—causing biased feature importance estimates.
  • Figure 2: Approximation bias: We examine feature $x$'s importance to loss $L=x^2+3$ by measure loss change when switching $x$ from informative to non-informative (zero). SHARK (left) and SFS (right bottom) have larger approximation error than our aggregated approximation method (right top).
  • Figure 3: Layer bias: The top shows feature gate magnitudes, and the bottom shows their changes after a hidden layer (square matrix to avoid weight mixing). Colors range from red (low) to blue (high).
  • Figure 4: The FairFS framework consists of three stages: Stage 1 sets anchor embedding points between the baseline and original feature embeddings, and computes their gradients. Stage 2 combines these anchor points with their gradients to estimate final feature importance. ($M$ is in training phase while $n_{ac}$ is in validation phase) Stage 3 illustrates how feature importance is applied in both the training and validation phases: in training, it serves as a regularizer, while in validation, it guides feature selection.
  • Figure 5: Efficiency comparison: feature selection time (Top) and inference time (Bottom) per sample
  • ...and 3 more figures