Table of Contents
Fetching ...

Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation

Jinze Wang, Lu Zhang, Yiyang Cui, Zhishu Shen, Xingjun Ma, Jiong Jin, Tiehua Zhang

TL;DR

The paper tackles cold-start next POI recommendation by showing that supervised fine-tuning and static prompts have limitations for LLM-based reasoning. It introduces Prompt-as-Policy, a reinforcement-guided prompting framework that learns to construct prompts through contextual-bandit optimization over a knowledge graph, converting relational paths into evidence cards and keeping the LLM frozen as a reasoning engine. A reward-balanced contextual Thompson Sampling–style learner tunes prompt configuration parameters (e.g., evidence count, ordering, and style) to maximize Acc@1 while maintaining diversity and efficiency, demonstrated on NYC, CAL, and SIN datasets with strong improvements for inactive users. The work reduces reliance on costly annotations and fine-tuning, delivering robust, cross-domain performance without retraining and highlighting the practical impact for scalable, real-world POI recommendation systems.

Abstract

Next point-of-interest (POI) recommendation is crucial for smart urban services such as tourism, dining, and transportation, yet most approaches struggle under cold-start conditions where user-POI interactions are sparse. Recent efforts leveraging large language models (LLMs) address this challenge through either supervised fine-tuning (SFT) or in-context learning (ICL). However, SFT demands costly annotations and fails to generalize to inactive users, while static prompts in ICL cannot adapt to diverse user contexts. To overcome these limitations, we propose Prompt-as-Policy over knowledge graphs, a reinforcement-guided prompting framework that learns to construct prompts dynamically through contextual bandit optimization. Our method treats prompt construction as a learnable policy that adaptively determines (i) which relational evidences to include, (ii) the number of evidence per candidate, and (iii) their organization and ordering within prompts. More specifically, we construct a knowledge graph (KG) to discover candidates and mine relational paths, which are transformed into evidence cards that summarize rationales for each candidate POI. The frozen LLM then acts as a reasoning engine, generating recommendations from the KG-discovered candidate set based on the policy-optimized prompts. Experiments on three real-world datasets demonstrate that Prompt-as-Policy consistently outperforms state-of-the-art baselines, achieving average 7.7\% relative improvements in Acc@1 for inactive users, while maintaining competitive performance on active users, without requiring model fine-tuning.

Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation

TL;DR

The paper tackles cold-start next POI recommendation by showing that supervised fine-tuning and static prompts have limitations for LLM-based reasoning. It introduces Prompt-as-Policy, a reinforcement-guided prompting framework that learns to construct prompts through contextual-bandit optimization over a knowledge graph, converting relational paths into evidence cards and keeping the LLM frozen as a reasoning engine. A reward-balanced contextual Thompson Sampling–style learner tunes prompt configuration parameters (e.g., evidence count, ordering, and style) to maximize Acc@1 while maintaining diversity and efficiency, demonstrated on NYC, CAL, and SIN datasets with strong improvements for inactive users. The work reduces reliance on costly annotations and fine-tuning, delivering robust, cross-domain performance without retraining and highlighting the practical impact for scalable, real-world POI recommendation systems.

Abstract

Next point-of-interest (POI) recommendation is crucial for smart urban services such as tourism, dining, and transportation, yet most approaches struggle under cold-start conditions where user-POI interactions are sparse. Recent efforts leveraging large language models (LLMs) address this challenge through either supervised fine-tuning (SFT) or in-context learning (ICL). However, SFT demands costly annotations and fails to generalize to inactive users, while static prompts in ICL cannot adapt to diverse user contexts. To overcome these limitations, we propose Prompt-as-Policy over knowledge graphs, a reinforcement-guided prompting framework that learns to construct prompts dynamically through contextual bandit optimization. Our method treats prompt construction as a learnable policy that adaptively determines (i) which relational evidences to include, (ii) the number of evidence per candidate, and (iii) their organization and ordering within prompts. More specifically, we construct a knowledge graph (KG) to discover candidates and mine relational paths, which are transformed into evidence cards that summarize rationales for each candidate POI. The frozen LLM then acts as a reasoning engine, generating recommendations from the KG-discovered candidate set based on the policy-optimized prompts. Experiments on three real-world datasets demonstrate that Prompt-as-Policy consistently outperforms state-of-the-art baselines, achieving average 7.7\% relative improvements in Acc@1 for inactive users, while maintaining competitive performance on active users, without requiring model fine-tuning.

Paper Structure

This paper contains 22 sections, 7 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Illustration of the sensitivity of LLM reasoning to prompt composition and ordering. Each prompt construction presents the same knowledge-graph derived evidences (Evidence 1–4) with slight variations in their content or order. Although the evidences convey similar relational rationales, the predicted next POI changes from a pizza shop to a coffee shop, demonstrating that both the selection and ordering of evidences within prompts substantially affect the LLM’s reasoning outcome.
  • Figure 2: Overview of the proposed Prompt-as-Policy framework.
  • Figure 3: Ablation study of Prompt-as-Policy over three datasets.