Table of Contents
Fetching ...

SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders

Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah, Wolfgang Wörndl, Yashar Deldjoo

TL;DR

The paper tackles the scarcity of rich, personalized, and sustainable data for Tourism Recommender Systems by introducing SynthTRIPs, a knowledge-grounded framework that uses LLMs to generate synthetic travel queries conditioned on diverse user personas and explicit sustainability filters. It formalizes a three-module system (Persona Hub, Travel Filters, KB-grounded Context Prompting), grounds outputs in a curated knowledge base, and provides a reproducible pipeline with open data and evaluation protocols. Empirical validation includes offline judgments and expert evaluations, showing that the synthetic queries can capture complex personalization while maintaining grounding and diversity, though there are trade-offs between personalization and factual alignment. The work yields a scalable resource to improve TRS training and benchmarking, supports off-peak and sustainable travel research, and offers a flexible methodology extendable to other domains, complemented by an extensive open-source KB and query dataset.

Abstract

Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users' preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies -- particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences. This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using LLMs grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries. We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB. We formalize the query generation process and introduce evaluation metrics for assessing realism and alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in existing datasets. While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains. Code and dataset are made public at https://bit.ly/synthTRIPs

SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders

TL;DR

The paper tackles the scarcity of rich, personalized, and sustainable data for Tourism Recommender Systems by introducing SynthTRIPs, a knowledge-grounded framework that uses LLMs to generate synthetic travel queries conditioned on diverse user personas and explicit sustainability filters. It formalizes a three-module system (Persona Hub, Travel Filters, KB-grounded Context Prompting), grounds outputs in a curated knowledge base, and provides a reproducible pipeline with open data and evaluation protocols. Empirical validation includes offline judgments and expert evaluations, showing that the synthetic queries can capture complex personalization while maintaining grounding and diversity, though there are trade-offs between personalization and factual alignment. The work yields a scalable resource to improve TRS training and benchmarking, supports off-peak and sustainable travel research, and offers a flexible methodology extendable to other domains, complemented by an extensive open-source KB and query dataset.

Abstract

Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users' preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies -- particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences. This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using LLMs grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries. We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB. We formalize the query generation process and introduce evaluation metrics for assessing realism and alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in existing datasets. While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains. Code and dataset are made public at https://bit.ly/synthTRIPs

Paper Structure

This paper contains 30 sections, 15 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Our proposed framework for generating synthetic data using LLMs for personalized, sustainable city trips.
  • Figure 2: Radar Chart showing the different dimensions of validation and performance of queries generated by Gemini. Llama shows similar performance across the dimensions and hence is excluded from the paper. L (E) denotes LLM (Expert) validations.
  • Figure 3: Screenshot showing a part of the evaluation tool developed for the expert study. The full version can be found https://huggingface.co/spaces/ashmib/user-feedback.