Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Shengyu Zhang; Ziqi Jiang; Jiangchao Yao; Fuli Feng; Kun Kuang; Zhou Zhao; Shuo Li; Hongxia Yang; Tat-Seng Chua; Fei Wu

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

TL;DR

This work tackles performance heterogeneity in recommender systems caused by data imbalance and model bias by applying front-door adjustment to address unobserved confounders. It introduces CausalD, a causal multi-teacher distillation framework that samples mediator representations from heterogeneous teachers to estimate the causal effect $P(Y\mid do(X))$, then distills this into a lightweight student model for efficient inference. Empirical results across MovieLens, Amazon, and AliPay show that CausalD improves overall recommendation quality while markedly reducing heterogeneity between user groups, outperforming both single-teacher distillation and traditional debiasing baselines. The approach demonstrates the practical value of integrating causal inference with knowledge distillation to debias training without sacrificing natural heterogeneity, with broad implications for robust, fair, and scalable recommender systems.

Abstract

Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this problem cannot sacrifice the overall performance, a wise choice is to eliminate the model bias while maintaining the natural heterogeneity. The key to debiased training lies in eliminating the effect of confounders that influence both the user's historical behaviors and the next behavior. The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice. To address unobserved confounders, we resort to the front-door adjustment (FDA) in causal theory and propose a causal multi-teacher distillation framework (CausalD). FDA requires proper mediators in order to estimate the causal effects of historical behaviors on the next behavior. To achieve this, we equip CausalD with multiple heterogeneous recommendation models to model the mediator distribution. Then, the causal effect estimated by FDA is the expectation of recommendation prediction over the mediator distribution and the prior distribution of historical behaviors, which is technically achieved by multi-teacher ensemble. To pursue efficient inference, CausalD further distills multiple teachers into one student model to directly infer the causal effect for making recommendations.

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

TL;DR

, then distills this into a lightweight student model for efficient inference. Empirical results across MovieLens, Amazon, and AliPay show that CausalD improves overall recommendation quality while markedly reducing heterogeneity between user groups, outperforming both single-teacher distillation and traditional debiasing baselines. The approach demonstrates the practical value of integrating causal inference with knowledge distillation to debias training without sacrificing natural heterogeneity, with broad implications for robust, fair, and scalable recommender systems.

Abstract

Paper Structure (25 sections, 22 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 22 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Preliminaries
Back-door Adjustment
Front-door Adjustment
Method
Problem Formulation
Front-door Adjustment in Recommendation
Causal Multi-teacher Distillation
Constructing Heterogeneous Teachers
Front-door Adjustment for Causal Label Distillation
Feature Distillation
Model Training
Method Analysis
Training Complexity
Inference Complexity
...and 10 more sections

Figures (5)

Figure 1: Recommendation performance over user groups clustered by 1) user activeness, and 2) behavior consistency on popular items. DIN Zhou_Zhu_Song_Fan_Zhu_Ma_Yan_Jin_Li_Gai_2018 and DIN (Group-wise) are under unified training and group-wise training, respectively. The user number of each group remains the same.
Figure 2: Causal graph for illustrating the amplified performance heterogeneity due to spurious correlation.
Figure 3: Causal graph for illustrating the back-door adjustment and front-door adjustment.
Figure 4: Performance heterogeneity across different user groups w.r.t. two confounders, i.e., user activeness, and behavior consistency. The performance heterogeneity over user groups is largely reduced compared to the base model DIN Zhou_Zhu_Song_Fan_Zhu_Ma_Yan_Jin_Li_Gai_2018.
Figure 5: Recommendation performance with loss coefficients ($\lambda_{BDA}$ and $\lambda_{FDA}$) varying in range $\{ 0.01, 0.1, 1, 10 \}$.

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

TL;DR

Abstract

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)