Table of Contents
Fetching ...

Inference Computation Scaling for Feature Augmentation in Recommendation Systems

Weihao Liu, Zhaocheng Du, Haiyuan Zhao, Wenbo Zhang, Xiaoyan Zhao, Gang Wang, Zhenhua Dong, Jun Xu

TL;DR

The paper addresses incomplete feature coverage and shallow descriptions in LLM-based feature augmentation for recommendations by applying inference scaling with extended Chain-of-Thought (long-CoT) reasoning. It treats feature generation as a scalable inference task with a policy model, a reward model, and search strategies (e.g., Best-of-N) to produce richer, more diverse features, achieving a $12\%$ improvement in NDCG@10 on benchmark datasets. Key contributions include (1) demonstrating inference scaling for recommendation feature augmentation, (2) linking gains to increased feature quantity and specificity, (3) analyzing how policy-model choice and search strategy affect outcomes, and (4) showing transfer of long-CoT benefits from math and coding to personalized recommendation. The findings suggest that longer reasoning and carefully chosen search procedures can significantly improve personalization by capturing nuanced user preferences, albeit with higher computational costs and with limitations that warrant further theoretical and industrial-scale study.

Abstract

Large language models have become a powerful method for feature augmentation in recommendation systems. However, existing approaches relying on quick inference often suffer from incomplete feature coverage and insufficient specificity in feature descriptions, limiting their ability to capture fine-grained user preferences and undermining overall performance. Motivated by the recent success of inference scaling in math and coding tasks, we explore whether scaling inference can address these limitations and enhance feature quality. Our experiments show that scaling inference leads to significant improvements in recommendation performance, with a 12% increase in NDCG@10. The gains can be attributed to two key factors: feature quantity and specificity. In particular, models using extended Chain-of-Thought (CoT) reasoning generate a greater number of detailed and precise features, offering deeper insights into user preferences and overcoming the limitations of quick inference. We further investigate the factors influencing feature quantity, revealing that model choice and search strategy play critical roles in generating a richer and more diverse feature set. This is the first work to apply inference scaling to feature augmentation in recommendation systems, bridging advances in reasoning tasks to enhance personalized recommendation.

Inference Computation Scaling for Feature Augmentation in Recommendation Systems

TL;DR

The paper addresses incomplete feature coverage and shallow descriptions in LLM-based feature augmentation for recommendations by applying inference scaling with extended Chain-of-Thought (long-CoT) reasoning. It treats feature generation as a scalable inference task with a policy model, a reward model, and search strategies (e.g., Best-of-N) to produce richer, more diverse features, achieving a improvement in NDCG@10 on benchmark datasets. Key contributions include (1) demonstrating inference scaling for recommendation feature augmentation, (2) linking gains to increased feature quantity and specificity, (3) analyzing how policy-model choice and search strategy affect outcomes, and (4) showing transfer of long-CoT benefits from math and coding to personalized recommendation. The findings suggest that longer reasoning and carefully chosen search procedures can significantly improve personalization by capturing nuanced user preferences, albeit with higher computational costs and with limitations that warrant further theoretical and industrial-scale study.

Abstract

Large language models have become a powerful method for feature augmentation in recommendation systems. However, existing approaches relying on quick inference often suffer from incomplete feature coverage and insufficient specificity in feature descriptions, limiting their ability to capture fine-grained user preferences and undermining overall performance. Motivated by the recent success of inference scaling in math and coding tasks, we explore whether scaling inference can address these limitations and enhance feature quality. Our experiments show that scaling inference leads to significant improvements in recommendation performance, with a 12% increase in NDCG@10. The gains can be attributed to two key factors: feature quantity and specificity. In particular, models using extended Chain-of-Thought (CoT) reasoning generate a greater number of detailed and precise features, offering deeper insights into user preferences and overcoming the limitations of quick inference. We further investigate the factors influencing feature quantity, revealing that model choice and search strategy play critical roles in generating a richer and more diverse feature set. This is the first work to apply inference scaling to feature augmentation in recommendation systems, bridging advances in reasoning tasks to enhance personalized recommendation.

Paper Structure

This paper contains 21 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The positive correlation between recommendation performance and the number of unique valid features generated by different LLMs. The red dotted line represents the best-fit line of the data.
  • Figure 2: Win-Tie-Lose Comparisons on specificity of features from gpt-4o-mini and o1-mini.
  • Figure 3: Number of unique features generated by different LLMs compared against the total number of generated features in the Toys dataset.
  • Figure 4: Comparison of different search strategies on the Instruments dataset.
  • Figure 5: An example of a prompt for the policy model and its corresponding response.
  • ...and 1 more figures