Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation
Peibo Li, Shuang Ao, Hao Xue, Yang Song, Maarten de Rijke, Johan Barthélemy, Tomasz Bednarz, Flora D. Salim
TL;DR
Refine-POI tackles next-POI recommendation by addressing two core issues: semantic continuity of POI IDs and the need for top-k list generation with reasoning. It introduces topology-aware SID generated via a hierarchical self-organizing map (HSOM) to preserve semantic locality, and it employs reinforcement fine-tuning with a recommendation-driven reward to optimize full top-$k$ lists rather than exact ground-truth matches. The framework uses trajectory prompting to incorporate long- and short-term memory in prompts and a multifaceted reward design that combines format, reciprocal rank, soft accuracy, diversity, and length terms. Experiments on NYC, TKY, and Gowalla-CA show state-of-the-art performance on top-$k$ metrics, improved handling of cold-start users, and evidence of grounded reasoning, albeit with higher training cost and potential reward-hacking concerns, pointing to future work on efficiency and process-supervised guidance.
Abstract
Advancing large language models (LLMs) for the next point-of-interest (POI) recommendation task faces two fundamental challenges: (i) although existing methods produce semantic IDs that incorporate semantic information, their topology-blind indexing fails to preserve semantic continuity, meaning that proximity in ID values does not mirror the coherence of the underlying semantics; and (ii) supervised fine-tuning (SFT)-based methods restrict model outputs to top-1 predictions. These approaches suffer from "answer fixation" and neglect the need for top-k ranked lists and reasoning due to the scarcity of supervision. We propose Refine-POI, a framework that addresses these challenges through topology-aware ID generation and reinforcement fine-tuning. First, we introduce a hierarchical self-organizing map (SOM) quantization strategy to generate semantic IDs, ensuring that coordinate proximity in the codebook reflects semantic similarity in the latent space. Second, we employ a policy-gradient framework to optimize the generation of top-k recommendation lists, liberating the model from strict label matching. Extensive experiments on three real-world datasets demonstrate that Refine-POI significantly outperforms state-of-the-art baselines, effectively synthesizing the reasoning capabilities of LLMs with the representational fidelity required for accurate and explainable next-POI recommendation.
