Table of Contents
Fetching ...

Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems

Genki Kusano, Kosuke Akimoto, Kunihiro Takeoka

TL;DR

The paper tackles prompt selection in LLM‑based recommender systems, showing that no single prompt universally outperforms others across datasets and highlighting the importance of dataset characteristics. It introduces a supervised prompt‑selection approach that uses a small validation set to pick the best prompt and a cost‑efficient exploration strategy that leverages both high‑performance and cheaper LLMs. Key findings indicate that incorporating item categories and descriptions can boost accuracy on certain datasets, and robust test performance is achieved when prompts are selected via validation data and analyzed with relative performance indicators. Practically, the work provides actionable guidelines for building accurate, cost‑efficient LLM‑RSs and discusses trade‑offs in exploration costs when deploying prompts at scale.

Abstract

In large language models (LLM)-based recommendation systems (LLM-RSs), accurately predicting user preferences by leveraging the general knowledge of LLMs is possible without requiring extensive training data. By converting recommendation tasks into natural language inputs called prompts, LLM-RSs can efficiently solve issues that have been difficult to address due to data scarcity but are crucial in applications such as cold-start and cross-domain problems. However, when applying this in practice, selecting the prompt that matches tasks and data is essential. Although numerous prompts have been proposed in LLM-RSs and representing the target user in prompts significantly impacts recommendation accuracy, there are still no clear guidelines for selecting specific prompts. In this paper, we categorize and analyze prompts from previous research to establish practical prompt selection guidelines. Through 450 experiments with 90 prompts and five real-world datasets, we examined the relationship between prompts and dataset characteristics in recommendation accuracy. We found that no single prompt consistently outperforms others; thus, selecting prompts on the basis of dataset characteristics is crucial. Here, we propose a prompt selection method that achieves higher accuracy with minimal validation data. Because increasing the number of prompts to explore raises costs, we also introduce a cost-efficient strategy using high-performance and cost-efficient LLMs, significantly reducing exploration costs while maintaining high prediction accuracy. Our work offers valuable insights into the prompt selection, advancing accurate and efficient LLM-RSs.

Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems

TL;DR

The paper tackles prompt selection in LLM‑based recommender systems, showing that no single prompt universally outperforms others across datasets and highlighting the importance of dataset characteristics. It introduces a supervised prompt‑selection approach that uses a small validation set to pick the best prompt and a cost‑efficient exploration strategy that leverages both high‑performance and cheaper LLMs. Key findings indicate that incorporating item categories and descriptions can boost accuracy on certain datasets, and robust test performance is achieved when prompts are selected via validation data and analyzed with relative performance indicators. Practically, the work provides actionable guidelines for building accurate, cost‑efficient LLM‑RSs and discusses trade‑offs in exploration costs when deploying prompts at scale.

Abstract

In large language models (LLM)-based recommendation systems (LLM-RSs), accurately predicting user preferences by leveraging the general knowledge of LLMs is possible without requiring extensive training data. By converting recommendation tasks into natural language inputs called prompts, LLM-RSs can efficiently solve issues that have been difficult to address due to data scarcity but are crucial in applications such as cold-start and cross-domain problems. However, when applying this in practice, selecting the prompt that matches tasks and data is essential. Although numerous prompts have been proposed in LLM-RSs and representing the target user in prompts significantly impacts recommendation accuracy, there are still no clear guidelines for selecting specific prompts. In this paper, we categorize and analyze prompts from previous research to establish practical prompt selection guidelines. Through 450 experiments with 90 prompts and five real-world datasets, we examined the relationship between prompts and dataset characteristics in recommendation accuracy. We found that no single prompt consistently outperforms others; thus, selecting prompts on the basis of dataset characteristics is crucial. Here, we propose a prompt selection method that achieves higher accuracy with minimal validation data. Because increasing the number of prompts to explore raises costs, we also introduce a cost-efficient strategy using high-performance and cost-efficient LLMs, significantly reducing exploration costs while maintaining high prediction accuracy. Our work offers valuable insights into the prompt selection, advancing accurate and efficient LLM-RSs.

Paper Structure

This paper contains 12 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example of a prompt for LLM-RSs. We conduct experiments by varying the user's information part, where the differences from related works are most noticeable.
  • Figure 2: Summarization prompt, its output text, and its inference prompt.
  • Figure 3: Calculation of the relative performance indicator.