Table of Contents
Fetching ...

Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation

Peibo Li, Shuang Ao, Hao Xue, Yang Song, Maarten de Rijke, Johan Barthélemy, Tomasz Bednarz, Flora D. Salim

TL;DR

Refine-POI tackles next-POI recommendation by addressing two core issues: semantic continuity of POI IDs and the need for top-k list generation with reasoning. It introduces topology-aware SID generated via a hierarchical self-organizing map (HSOM) to preserve semantic locality, and it employs reinforcement fine-tuning with a recommendation-driven reward to optimize full top-$k$ lists rather than exact ground-truth matches. The framework uses trajectory prompting to incorporate long- and short-term memory in prompts and a multifaceted reward design that combines format, reciprocal rank, soft accuracy, diversity, and length terms. Experiments on NYC, TKY, and Gowalla-CA show state-of-the-art performance on top-$k$ metrics, improved handling of cold-start users, and evidence of grounded reasoning, albeit with higher training cost and potential reward-hacking concerns, pointing to future work on efficiency and process-supervised guidance.

Abstract

Advancing large language models (LLMs) for the next point-of-interest (POI) recommendation task faces two fundamental challenges: (i) although existing methods produce semantic IDs that incorporate semantic information, their topology-blind indexing fails to preserve semantic continuity, meaning that proximity in ID values does not mirror the coherence of the underlying semantics; and (ii) supervised fine-tuning (SFT)-based methods restrict model outputs to top-1 predictions. These approaches suffer from "answer fixation" and neglect the need for top-k ranked lists and reasoning due to the scarcity of supervision. We propose Refine-POI, a framework that addresses these challenges through topology-aware ID generation and reinforcement fine-tuning. First, we introduce a hierarchical self-organizing map (SOM) quantization strategy to generate semantic IDs, ensuring that coordinate proximity in the codebook reflects semantic similarity in the latent space. Second, we employ a policy-gradient framework to optimize the generation of top-k recommendation lists, liberating the model from strict label matching. Extensive experiments on three real-world datasets demonstrate that Refine-POI significantly outperforms state-of-the-art baselines, effectively synthesizing the reasoning capabilities of LLMs with the representational fidelity required for accurate and explainable next-POI recommendation.

Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation

TL;DR

Refine-POI tackles next-POI recommendation by addressing two core issues: semantic continuity of POI IDs and the need for top-k list generation with reasoning. It introduces topology-aware SID generated via a hierarchical self-organizing map (HSOM) to preserve semantic locality, and it employs reinforcement fine-tuning with a recommendation-driven reward to optimize full top- lists rather than exact ground-truth matches. The framework uses trajectory prompting to incorporate long- and short-term memory in prompts and a multifaceted reward design that combines format, reciprocal rank, soft accuracy, diversity, and length terms. Experiments on NYC, TKY, and Gowalla-CA show state-of-the-art performance on top- metrics, improved handling of cold-start users, and evidence of grounded reasoning, albeit with higher training cost and potential reward-hacking concerns, pointing to future work on efficiency and process-supervised guidance.

Abstract

Advancing large language models (LLMs) for the next point-of-interest (POI) recommendation task faces two fundamental challenges: (i) although existing methods produce semantic IDs that incorporate semantic information, their topology-blind indexing fails to preserve semantic continuity, meaning that proximity in ID values does not mirror the coherence of the underlying semantics; and (ii) supervised fine-tuning (SFT)-based methods restrict model outputs to top-1 predictions. These approaches suffer from "answer fixation" and neglect the need for top-k ranked lists and reasoning due to the scarcity of supervision. We propose Refine-POI, a framework that addresses these challenges through topology-aware ID generation and reinforcement fine-tuning. First, we introduce a hierarchical self-organizing map (SOM) quantization strategy to generate semantic IDs, ensuring that coordinate proximity in the codebook reflects semantic similarity in the latent space. Second, we employ a policy-gradient framework to optimize the generation of top-k recommendation lists, liberating the model from strict label matching. Extensive experiments on three real-world datasets demonstrate that Refine-POI significantly outperforms state-of-the-art baselines, effectively synthesizing the reasoning capabilities of LLMs with the representational fidelity required for accurate and explainable next-POI recommendation.

Paper Structure

This paper contains 29 sections, 16 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The key differences between topology-blind SID with SFT and topology-aware SID with Refine-POI. The SID in (a, left) refer to existing methods that do not consider semantic continuity between close indices. SFT (b, top) can only perform exact matching, so the model is trained to produce a single item. The SID in (a, right) are constructed as coordinates in a map, where close coordinates are also similar in semantics. Refine-POI trains the model with RFT (b, bottom); the model is encouraged to produce top-$k$ recommendation lists, and the correct item is rewarded based on its position.
  • Figure 2: Overview of the Refine-POI framework, which consists of two module. (i) The left side is the topology-aware SID generation with a hierarchical SOM. And (ii) the right side is the RFT. For training, we begin with trajectory prompting, in which we transform check-in records into prompts and enrich them with additional POI address information. Then we adopt GRPO shao2024deepseekmath as the RFT algorithm. In the bottom-right example, the first response has the correct format, the correct item appears in the first position, and all items are distinct. Thus, if we assume the output length is higher than the target length, the reward would be $1$ (format reward) + $\frac{1}{1}$ ( reciprocal rank reward) + $1$ (soft accuracy reward) + $10\times0.1$ (distinction reward) + $1$ (length reward). The second response has the wrong format, despite containing the correct item. Therefore, the reward for the second response is 0. The last response has the correct format. The correct item appears in the 5-th position, and there are only eight distinct items. So the reciprocal rank reward and the distinction reward would be $\frac{1}{5}$ and $8\times 0.1$. Note that here we use a weight of 1 for all the rewards for clarity, which is different from the weights we used in the final model.
  • Figure 3: An example of the model's grounded reasoning trace. We do not show the entire raw prompt and output as it would be too long.
  • Figure 4: Reasoning analysis for Refine-POI on NYC.
  • Figure 5: Comparison of normalized intra-class compactness on NYC between two types of SID.
  • ...and 1 more figures