Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

Zhenyu Zhao; Yexi Jiang

Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

Zhenyu Zhao, Yexi Jiang

TL;DR

The paper tackles feature selection for contextual multi-armed bandits by focusing on heterogeneous treatment effects rather than mere outcome correlation. It introduces two model-free filter methods, Heterogeneous Incremental Effect (HIE) and Heterogeneous Distribution Divergence (HDD), which quantify how features influence arm selection and reward distributions through bin-based analyses and bootstrap-based significance testing. The methods are designed to be computationally efficient and robust to model mis-specification, enabling rapid offline screening in large-scale systems. Empirical results on synthetic data and a real online recommender deployment demonstrate that HIE and HDD reliably identify influential HTE features and translate into improved CMAB performance online.

Abstract

Effective feature selection is essential for optimizing contextual multi-armed bandits (CMABs) in large-scale online systems, where suboptimal features can degrade rewards, interpretability, and efficiency. Traditional feature selection often prioritizes outcome correlation, neglecting the crucial role of heterogeneous treatment effects (HTE) across arms in CMAB decision-making. This paper introduces two novel, model-free filter methods, Heterogeneous Incremental Effect (HIE) and Heterogeneous Distribution Divergence (HDD), specifically designed to identify features driving HTE. HIE quantifies a feature's value based on its ability to induce changes in the optimal arm, while HDD measures its impact on reward distribution divergence across arms. These methods are computationally efficient, robust to model mis-specification, and adaptable to various feature types, making them suitable for rapid screening in dynamic environments where retraining complex models is infeasible. We validate HIE and HDD on synthetic data with known ground truth and in a large-scale commercial recommender system, demonstrating their consistent ability to identify influential HTE features and thereby enhance CMAB performance.

Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

TL;DR

Abstract

Paper Structure (22 sections, 2 theorems, 17 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 22 sections, 2 theorems, 17 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
Feature Selection Methods for Heterogeneous Effects in Contextual Multi-Armed Bandits
Heterogeneous Incremental Effect (HIE) Score
Normalized HIE Score
Feature Importance Statistical Significance
Algorithm: Computing Normalized HIE Score and p-value
Heterogeneous Distribution Divergence (HDD) Score
Normalized HDD Score
Statistical Significance of the HDD Score
Algorithm: Computing Normalized HDD Score and p-value
Algorithm: Computing Normalized HDD Score and p-value
Evaluation
Evaluation with Synthetic Data
Online Experiments
Computational Considerations and Practical Guidance
...and 7 more sections

Key Result

Proposition 1

The HIE score is non-negative: $FI_{HIE}(x|m) \geq 0$.

Figures (5)

Figure 1: Illustration of feature patterns in synthetic data (12 features, 4 arms, N=100k, visualized with 10 bins). HTE features (e.g., X5-X9) alter relative arm performance, unlike purely correlational (X1-X4) or random (X11-X12) features.
Figure 2: Sensitivity of normalized HIE and HDD scores to the number of bins ($m$) on synthetic data (N=100,000). Statistical significance of scores is color-coded. (a) HIE scores. (b) HDD scores.
Figure 3: Comparison of feature importance scores (HIE, HDD, Pearson Correlation) and CMAB rewards (LinUCB, NonLinearUCB, CohortMAB) for synthetic data at N=100,000 (20 bins for HIE/HDD). Stars indicate statistical significance (p<0.01) for applicable methods (Correlation, HIE, HDD). True HTE features (X5-X9, with hatch pattern) are consistently highly ranked by HIE/HDD and yield high CMAB rewards.
Figure 4: Feature scores based on online non-contextual MAB data: Features are ordered by the HIE score. Each color represents a content item, and four random benchmark features are indicated with grey dots.
Figure 5: ROC curves for CMAB treatment effect significance, based on feature importance score p-values (feature scores are used to resolve ties).

Theorems & Definitions (4)

Proposition 1
Proposition 2
proof
proof

Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

TL;DR

Abstract

Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)