SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders
Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah, Wolfgang Wörndl, Yashar Deldjoo
TL;DR
The paper tackles the scarcity of rich, personalized, and sustainable data for Tourism Recommender Systems by introducing SynthTRIPs, a knowledge-grounded framework that uses LLMs to generate synthetic travel queries conditioned on diverse user personas and explicit sustainability filters. It formalizes a three-module system (Persona Hub, Travel Filters, KB-grounded Context Prompting), grounds outputs in a curated knowledge base, and provides a reproducible pipeline with open data and evaluation protocols. Empirical validation includes offline judgments and expert evaluations, showing that the synthetic queries can capture complex personalization while maintaining grounding and diversity, though there are trade-offs between personalization and factual alignment. The work yields a scalable resource to improve TRS training and benchmarking, supports off-peak and sustainable travel research, and offers a flexible methodology extendable to other domains, complemented by an extensive open-source KB and query dataset.
Abstract
Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users' preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies -- particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences. This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using LLMs grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries. We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB. We formalize the query generation process and introduce evaluation metrics for assessing realism and alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in existing datasets. While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains. Code and dataset are made public at https://bit.ly/synthTRIPs
