VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Harshul Raj Surana; Arijit Maji; Aryan Vats; Akash Ghosh; Sriparna Saha; Amit Sheth

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

TL;DR

This work introduces VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture, and proposes a novel framework named Symbolic Chain-of-Manipulation (SCoM), adapting the Chain-of-Manipulation paradigm, to train the model to simulate atomic Knowledge Graph manipulations internally.

Abstract

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich socio-cultural knowledge and diverse local contexts, particularly those involving Indian Culture. Existing Cultural benchmarks are (i) Manually crafted, (ii) contain single-hop questions testing factual recall, and (iii) prohibitively costly to scale, leaving this deficiency largely unmeasured. To address this, we introduce VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture. VIRAASAT leverages a Knowledge Graph comprising more than 700 expert-curated cultural artifacts, covering 13 key attributes of Indian culture (history, festivals, etc). VIRAASAT spans all 28 states and 8 Union Territories, yielding more than 3,200 multi-hop questions that necessitate chained cultural reasoning. We evaluate current State-of-the-Art (SOTA) LLMs on VIRAASAT and identify key limitations in reasoning wherein fine-tuning on Chain-of-Thought(CoT) traces fails to ground and synthesize low-probability facts. To bridge this gap, we propose a novel framework named Symbolic Chain-of-Manipulation (SCoM). Adapting the Chain-of-Manipulation paradigm, we train the model to simulate atomic Knowledge Graph manipulations internally. SCoM teaches the model to reliably traverse the topological structure of the graph. Experiments on Supervised Fine-Tuning (SFT) demonstrate that SCoM outperforms standard CoT baselines by up to 20%. We release the VIRAASAT dataset along with our findings, laying a strong foundation towards building Culturally Aware Reasoning Models.

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

TL;DR

Abstract

Paper Structure (34 sections, 8 figures, 6 tables)

This paper contains 34 sections, 8 figures, 6 tables.

Introduction
Research Motivation:
Related Works
VIRAASAT Dataset Creation
Annotators
Annotator Training
Data Collection
Knowledge Graph Construction
Question Generation
Manual Verification
Question Curation
Compensation
Factual Responsibility
Inter-annotator Agreement
SCoM Reasoning Data Generation
...and 19 more sections

Figures (8)

Figure 1: VIRAASAT dataset construction pipeline: artifact curation and identifier creation, Knowledge Graph construction, template-based 2-hop question generation, and expert verification for semantic and grammatical quality.
Figure 2: State-wise and attribute-wise distribution of questions in VIRAASAT.
Figure 3: SCoM reasoning-trace generation pipeline: The figure illustrates the SCoM reasoning trace generation pipeline. The Student Agent (actor) solves the question by producing an explicit, step-wise trace consisting of THOUGHT and ACTION steps, where each ACTION represents an atomic manipulation over the cultural knowledge graph (entity grounding, enforcing the state constraint, candidate retrieval and resolution, and target resolution). Each proposed action is checked by the Symbolic Verifier (teacher) that validates graph-topology consistency with the constrained path; if an action violates constraints, the verifier injects a correction signal that redirects the next step, preventing drift from the intended traversal. Valid actions are executed against the parametric environment backed by the cultural knowledge graph, returning OBSERVATION outputs such as candidate entities and textual descriptors used to guide subsequent steps. The actor–verifier loop iterates for a bounded number of turns (k=5), producing a faithful, tool-grounded SCoM reasoning trace that is used for downstream SFT, teaching a model to perform structured graph-grounded retrieval and reasoning.
Figure 4: Example SCoM trace (cleaned) illustrating tool-grounded retrieval over a constrained 2-hop path.
Figure 5: Prompt used for SFT, providing the necessary task context and details.
...and 3 more figures

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

TL;DR

Abstract

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)