Table of Contents
Fetching ...

Why is Normalization Necessary for Linear Recommenders?

Seongmin Park, Mincheol Yoon, Hye-young Kim, Jongwuk Lee

TL;DR

This work tackles popularity and neighborhood biases in linear autoencoder (LAE)–based recommenders by examining existing normalization methods and introducing Data-Adaptive Normalization (DAN). DAN provides item- and user-adaptive normalization that adjusts to dataset-specific skewness and homophily, with a closed-form solution that preserves eigenstructure while enabling denoising-like benefits. Empirical results across six benchmark datasets show that DAN-equipped LAEs achieve significant gains, particularly for long-tail items and unbiased evaluations, while maintaining computational efficiency. The approach is model-agnostic within LAEs and lays a path for extending adaptive normalization to neural models, offering practical impact for scalable, accurate recommendation in diverse data regimes.

Abstract

Despite their simplicity, linear autoencoder (LAE)-based models have shown comparable or even better performance with faster inference speed than neural recommender models. However, LAEs face two critical challenges: (i) popularity bias, which tends to recommend popular items, and (ii) neighborhood bias, which overly focuses on capturing local item correlations. To address these issues, this paper first analyzes the effect of two existing normalization methods for LAEs, i.e., random-walk and symmetric normalization. Our theoretical analysis reveals that normalization highly affects the degree of popularity and neighborhood biases among items. Inspired by this analysis, we propose a versatile normalization solution, called Data-Adaptive Normalization (DAN), which flexibly controls the popularity and neighborhood biases by adjusting item- and user-side normalization to align with unique dataset characteristics. Owing to its model-agnostic property, DAN can be easily applied to various LAE-based models. Experimental results show that DAN-equipped LAEs consistently improve existing LAE-based models across six benchmark datasets, with significant gains of up to 128.57% and 12.36% for long-tail items and unbiased evaluations, respectively. Refer to our code in https://github.com/psm1206/DAN.

Why is Normalization Necessary for Linear Recommenders?

TL;DR

This work tackles popularity and neighborhood biases in linear autoencoder (LAE)–based recommenders by examining existing normalization methods and introducing Data-Adaptive Normalization (DAN). DAN provides item- and user-adaptive normalization that adjusts to dataset-specific skewness and homophily, with a closed-form solution that preserves eigenstructure while enabling denoising-like benefits. Empirical results across six benchmark datasets show that DAN-equipped LAEs achieve significant gains, particularly for long-tail items and unbiased evaluations, while maintaining computational efficiency. The approach is model-agnostic within LAEs and lays a path for extending adaptive normalization to neural models, offering practical impact for scalable, accurate recommendation in diverse data regimes.

Abstract

Despite their simplicity, linear autoencoder (LAE)-based models have shown comparable or even better performance with faster inference speed than neural recommender models. However, LAEs face two critical challenges: (i) popularity bias, which tends to recommend popular items, and (ii) neighborhood bias, which overly focuses on capturing local item correlations. To address these issues, this paper first analyzes the effect of two existing normalization methods for LAEs, i.e., random-walk and symmetric normalization. Our theoretical analysis reveals that normalization highly affects the degree of popularity and neighborhood biases among items. Inspired by this analysis, we propose a versatile normalization solution, called Data-Adaptive Normalization (DAN), which flexibly controls the popularity and neighborhood biases by adjusting item- and user-side normalization to align with unique dataset characteristics. Owing to its model-agnostic property, DAN can be easily applied to various LAE-based models. Experimental results show that DAN-equipped LAEs consistently improve existing LAE-based models across six benchmark datasets, with significant gains of up to 128.57% and 12.36% for long-tail items and unbiased evaluations, respectively. Refer to our code in https://github.com/psm1206/DAN.

Paper Structure

This paper contains 28 sections, 4 theorems, 27 equations, 9 figures, 8 tables.

Key Result

Theorem 4.1

Item-adaptive normalization (i) provides a denoising effect Steck20edlae and (ii) controls popularity bias: A larger $\alpha$ alleviates target items' popularity bias, while a smaller $\alpha$ focuses on source items' popularity bias.

Figures (9)

  • Figure 1: Performance of popular and unpopular items on ML-20M and Yelp2018. The x-axis categorizes items into 'Head' (top 20% popular items) and 'Tail' (the remaining items), while the y-axis represents NDCG@20. 'W/O Norm.' and 'W/ Norm.' denote LAE without and with normalization.
  • Figure 2: Performance for six datasets categorized into two groups: High-homophilic group (ML-20M, Netflix, and MSD) with low neighborhood bias and Low-homophilic group (Gowalla, Yelp2018, and Amazon-book) with high neighborhood bias. The x-axis lists the individual datasets.
  • Figure 3: Distribution of the weights learned Steck19EASE with different $\alpha$ values on ML-20M. The red and blue lines are the estimated probability density functions (PDFs) for head and tail items. The weights are averaged over $\hat{\mathbf{B}}$ in a column-wise direction. The x-axis is the average weight of items, and the area under the curve corresponds to the probability of having the weights within that range.
  • Figure 4: Eigenvalue distribution of the weight matrix $\mathbf{\hat{B}}$ according to $\beta$ on ML-20M and Yelp2018.
  • Figure 5: NDCG@20 of LAE$_\text{DAN}$ over $\alpha$ for item normalization on four datasets. When adjusting $\alpha$, we keep $\beta$ fixed at the optimal parameter.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Theorem 4.1
  • Lemma 4.1: Eigenvalue Relationship between Weight Matrix and Gram Matrix
  • Lemma 4.2: Monotonicity of Eigenvalues via Rayleigh Quotient
  • Theorem 4.2