Table of Contents
Fetching ...

Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy

Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin, Wei Guo, Yong Liu, Huifeng Guo, Defu Lian, Ruiming Tang, Enhong Chen

TL;DR

This work tackles two core SR challenges: the mismatch between model loss-based scaling laws and actual SR performance, and the detrimental impact of data redundancy on SR outcomes. By formulating a Performance Law that fits SR performance metrics such as HR@10 and NDCG@10, and by introducing Approximate Entropy as a data-quality measure, the authors enable accurate performance predictions across model sizes and dataset scales. They validate the approach with transformer-based SR models, showing strong correlations between the data proxy $D'=#Tokens\cdot ApEn'$ and performance across diverse datasets, and demonstrate practical applications in global/local parameter optimization and cross-framework scaling. The findings offer a principled way to balance data quantity and quality to achieve near-optimal SR performance under real-world resource constraints.

Abstract

Scaling Laws have emerged as a powerful framework for understanding how model performance evolves as they increase in size, providing valuable insights for optimizing computational resources. In the realm of Sequential Recommendation (SR), which is pivotal for predicting users' sequential preferences, these laws offer a lens through which to address the challenges posed by the scalability of SR models. However, the presence of structural and collaborative issues in recommender systems prevents the direct application of the Scaling Law (SL) in these systems. In response, we introduce the Performance Law for SR models, which aims to theoretically investigate and model the relationship between model performance and data quality. Specifically, we first fit the HR and NDCG metrics to transformer-based SR models. Subsequently, we propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics. Our method enables accurate predictions across various dataset scales and model sizes, demonstrating a strong correlation in large SR models and offering insights into achieving optimal performance for any given model configuration.

Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy

TL;DR

This work tackles two core SR challenges: the mismatch between model loss-based scaling laws and actual SR performance, and the detrimental impact of data redundancy on SR outcomes. By formulating a Performance Law that fits SR performance metrics such as HR@10 and NDCG@10, and by introducing Approximate Entropy as a data-quality measure, the authors enable accurate performance predictions across model sizes and dataset scales. They validate the approach with transformer-based SR models, showing strong correlations between the data proxy and performance across diverse datasets, and demonstrate practical applications in global/local parameter optimization and cross-framework scaling. The findings offer a principled way to balance data quantity and quality to achieve near-optimal SR performance under real-world resource constraints.

Abstract

Scaling Laws have emerged as a powerful framework for understanding how model performance evolves as they increase in size, providing valuable insights for optimizing computational resources. In the realm of Sequential Recommendation (SR), which is pivotal for predicting users' sequential preferences, these laws offer a lens through which to address the challenges posed by the scalability of SR models. However, the presence of structural and collaborative issues in recommender systems prevents the direct application of the Scaling Law (SL) in these systems. In response, we introduce the Performance Law for SR models, which aims to theoretically investigate and model the relationship between model performance and data quality. Specifically, we first fit the HR and NDCG metrics to transformer-based SR models. Subsequently, we propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics. Our method enables accurate predictions across various dataset scales and model sizes, demonstrating a strong correlation in large SR models and offering insights into achieving optimal performance for any given model configuration.

Paper Structure

This paper contains 19 sections, 5 theorems, 31 equations, 5 figures, 4 tables.

Key Result

Lemma 3.1

In the first-order stationary Markov chain (discrete state space $X$) case, with r$< min( |$x- y| , x$\neq$y, x and y state space values), a.s. for any m , where $\pi(\mathrm{x})$ is the stationary distribution of $x$.

Figures (5)

  • Figure 1: Distinction between Performance Law and Scaling Law. Performance typically shows decay as the model size increases.
  • Figure 2: The relationship between model loss and the number of layers (horizontal axis, H), as well as the embedding dimensions (different colored lines, $d_{emb}$), the plot includes annotations of the coefficient of determination $R^2$.
  • Figure 3: The relationship between model HR performance and the number of layers (x-axis, N), as well as the embedding dimensions (y-axis, $d_{emb}$), the plot includes annotations of the fitted parameter $w$.
  • Figure 4: The linear correlation between parameter $D$ and Tokens/Apen. The upper figure validates this relationship within the context of the Scaling Law Loss, while the lower figure verifies it within the Performance Law Metric.
  • Figure 5: The relationship between model NDCG performance and the number of layers (x-axis, N), as well as the embedding dimensions (y-axis, $d_{emb}$), the plot includes annotations of the fitted parameter $w$.

Theorems & Definitions (9)

  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Lemma 3.4
  • Theorem 3.5
  • proof