Table of Contents
Fetching ...

Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

Praveen Gatla, Anushka, Nikita Kanwar, Gouri Sahoo, Rajesh Kumar Mundotiya

TL;DR

The study builds a baseline extractive QA system for Hindi tourism in Varanasi by creating a large, domain-spanning Hindi QA dataset and evaluating BERT/RoBERTa models with supervised fine-tuning and LoRA-based parameter-efficient tuning. It demonstrates that Hindi-specific pretraining (HindiBERT, HindiRoBERTa) generally outperforms multilingual bases, with RoBERTa + SFT delivering strong domain performance and LoRA offering substantial parameter reduction while maintaining competitive results in several subdomains. The work provides a foundational Hindi tourism QA benchmark and insights into model selection for low-resource, culturally nuanced domains, highlighting the balance between accuracy and efficiency. It also points to future integration with retrieval-augmented generation (RAG) and expansion to additional Indian-language domains to enhance accessibility for visitors and researchers alike.

Abstract

This article presents the first comprehensive study on designing a baseline extractive question-answering (QA) system for the Hindi tourism domain, with a specialized focus on the Varanasi-a cultural and spiritual hub renowned for its Bhakti-Bhaav (devotional ethos). Targeting ten tourism-centric subdomains-Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, General, Ashram, Temple and Travel, the work addresses the absence of language-specific QA resources in Hindi for culturally nuanced applications. In this paper, a dataset comprising 7,715 Hindi QA pairs pertaining to Varanasi tourism was constructed and subsequently augmented with 27,455 pairs generated via Llama zero-shot prompting. We propose a framework leveraging foundation models-BERT and RoBERTa, fine-tuned using Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA), to optimize parameter efficiency and task performance. Multiple variants of BERT, including pre-trained languages (e.g., Hindi-BERT), are evaluated to assess their suitability for low-resource domain-specific QA. Evaluation metrics - F1, BLEU, and ROUGE-L - highlight trade-offs between answer precision and linguistic fluency. Experiments demonstrate that LoRA-based fine-tuning achieves competitive performance (85.3\% F1) while reducing trainable parameters by 98\% compared to SFT, striking a balance between efficiency and accuracy. Comparative analysis across models reveals that RoBERTa with SFT outperforms BERT variants in capturing contextual nuances, particularly for culturally embedded terms (e.g., Aarti, Kund). This work establishes a foundational baseline for Hindi tourism QA systems, emphasizing the role of LORA in low-resource settings and underscoring the need for culturally contextualized NLP frameworks in the tourism domain.

Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models

TL;DR

The study builds a baseline extractive QA system for Hindi tourism in Varanasi by creating a large, domain-spanning Hindi QA dataset and evaluating BERT/RoBERTa models with supervised fine-tuning and LoRA-based parameter-efficient tuning. It demonstrates that Hindi-specific pretraining (HindiBERT, HindiRoBERTa) generally outperforms multilingual bases, with RoBERTa + SFT delivering strong domain performance and LoRA offering substantial parameter reduction while maintaining competitive results in several subdomains. The work provides a foundational Hindi tourism QA benchmark and insights into model selection for low-resource, culturally nuanced domains, highlighting the balance between accuracy and efficiency. It also points to future integration with retrieval-augmented generation (RAG) and expansion to additional Indian-language domains to enhance accessibility for visitors and researchers alike.

Abstract

This article presents the first comprehensive study on designing a baseline extractive question-answering (QA) system for the Hindi tourism domain, with a specialized focus on the Varanasi-a cultural and spiritual hub renowned for its Bhakti-Bhaav (devotional ethos). Targeting ten tourism-centric subdomains-Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, General, Ashram, Temple and Travel, the work addresses the absence of language-specific QA resources in Hindi for culturally nuanced applications. In this paper, a dataset comprising 7,715 Hindi QA pairs pertaining to Varanasi tourism was constructed and subsequently augmented with 27,455 pairs generated via Llama zero-shot prompting. We propose a framework leveraging foundation models-BERT and RoBERTa, fine-tuned using Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA), to optimize parameter efficiency and task performance. Multiple variants of BERT, including pre-trained languages (e.g., Hindi-BERT), are evaluated to assess their suitability for low-resource domain-specific QA. Evaluation metrics - F1, BLEU, and ROUGE-L - highlight trade-offs between answer precision and linguistic fluency. Experiments demonstrate that LoRA-based fine-tuning achieves competitive performance (85.3\% F1) while reducing trainable parameters by 98\% compared to SFT, striking a balance between efficiency and accuracy. Comparative analysis across models reveals that RoBERTa with SFT outperforms BERT variants in capturing contextual nuances, particularly for culturally embedded terms (e.g., Aarti, Kund). This work establishes a foundational baseline for Hindi tourism QA systems, emphasizing the role of LORA in low-resource settings and underscoring the need for culturally contextualized NLP frameworks in the tourism domain.

Paper Structure

This paper contains 22 sections, 14 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Varanasi Tourism QA Dataset: Structure
  • Figure 2: Tourism Domain Example
  • Figure 3: Overview of the complete methodology of the model. The green arrows in the SFT and LoRA modules indicate the back-propagation steps during fine-tuning. At any given time, either SFT or LoRA is activated for modeling the question-answering task.
  • Figure 4: F1 score comparison of the models across 11 domain settings. Configurations with LoRA-based adapters at ranks r8, r16, and r32 consistently underperformed compared to r2 and r4; thus, they are omitted from this figure for clarity.
  • Figure 5: BLEU score comparison of the models across 11 domain settings
  • ...and 1 more figures