Table of Contents
Fetching ...

Improve Temporal Awareness of LLMs for Sequential Recommendation

Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun

TL;DR

Problem: LLMs struggle to leverage temporal information in sequential recommendation. Approach: Tempura uses training-free prompting with proximal temporal demonstrations (PCL), global interest demonstrations (GCL), explicit temporal-structure analysis via cluster prompts, and a prompt-ensemble to merge results. Findings: On MovieLens-1M and Amazon Review data, Tempura yields significant zero-shot gains in $NDCG@K$ over strong baselines, with GPT-4 providing further boosts. Significance: The framework is domain-agnostic and deployable without fine-tuning, offering a practical path to time-aware recommendations with LLMs.

Abstract

Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awareness of LLMs by designing a principled prompting framework inspired by human cognitive processes. Specifically, we propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation. Besides, we emulate divergent thinking by aggregating LLM ranking results derived from these strategies. Evaluations on MovieLens-1M and Amazon Review datasets indicate that our proposed method significantly enhances the zero-shot capabilities of LLMs in sequential recommendation tasks.

Improve Temporal Awareness of LLMs for Sequential Recommendation

TL;DR

Problem: LLMs struggle to leverage temporal information in sequential recommendation. Approach: Tempura uses training-free prompting with proximal temporal demonstrations (PCL), global interest demonstrations (GCL), explicit temporal-structure analysis via cluster prompts, and a prompt-ensemble to merge results. Findings: On MovieLens-1M and Amazon Review data, Tempura yields significant zero-shot gains in over strong baselines, with GPT-4 providing further boosts. Significance: The framework is domain-agnostic and deployable without fine-tuning, offering a practical path to time-aware recommendations with LLMs.

Abstract

Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awareness of LLMs by designing a principled prompting framework inspired by human cognitive processes. Specifically, we propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation. Besides, we emulate divergent thinking by aggregating LLM ranking results derived from these strategies. Evaluations on MovieLens-1M and Amazon Review datasets indicate that our proposed method significantly enhances the zero-shot capabilities of LLMs in sequential recommendation tasks.
Paper Structure (13 sections, 5 equations, 6 figures, 4 tables)

This paper contains 13 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: LLM-based sequential recommendation baselines show comparable performance even when historical interactions (Sequential) order is randomized (Random). Tempura significantly boosts performance by utilizing historical orders, i.e., temporal information.
  • Figure 2: An illustrative overview of Tempura. We learn sequential recommendation via two kinds in-context demonstrations. Explicit cluster structure analysis is conducted to improve the temporal understanding capabilities of LLMs. Each prompting strategy independently generates a respective ranking by LLMs (marked by different colors). Rankings from different prompting strategies are aggregated to form the final ranking.
  • Figure 3: Performance vs. history length $|\mathcal{H}|$ (ML-1M).
  • Figure 4: Impact of #in-context examples in PCL. Several more examples can improve performance.
  • Figure :
  • ...and 1 more figures