Table of Contents
Fetching ...

BASIR: Budget-Assisted Sectoral Impact Ranking -- A Dataset for Sector Identification and Performance Prediction Using Language Models

Sohom Ghosh, Sudip Kumar Naskar

TL;DR

BASIR introduces a dataset and framework forBudget-Assisted Sectoral Impact Ranking, enabling automatic identification of budget-text sectors and ranking their expected performance. By combining fine-tuned embeddings for sector identification with transformer-based ranking models, the approach achieves 0.605 F1 in sector classification and 0.997 NDCG in predicting sector performance post-budget. The dataset spans Indian Union Budgets from 1947 to 2025, comprising 1,600+ excerpts across 81 sectors and 400+ performance-labeled texts, with data splits spanning pre- and post-2020 periods to support predictive evaluation. This work offers a scalable, data-driven tool for investors and policymakers to quantify fiscal policy impacts and prioritizes releasing the annotated BASIR dataset under CC-BY-NC-SA-4.0 to foster further research.

Abstract

Government fiscal policies, particularly annual union budgets, exert significant influence on financial markets. However, real-time analysis of budgetary impacts on sector-specific equity performance remains methodologically challenging and largely unexplored. This study proposes a framework to systematically identify and rank sectors poised to benefit from India's Union Budget announcements. The framework addresses two core tasks: (1) multi-label classification of excerpts from budget transcripts into 81 predefined economic sectors, and (2) performance ranking of these sectors. Leveraging a comprehensive corpus of Indian Union Budget transcripts from 1947 to 2025, we introduce BASIR (Budget-Assisted Sectoral Impact Ranking), an annotated dataset mapping excerpts from budgetary transcripts to sectoral impacts. Our architecture incorporates fine-tuned embeddings for sector identification, coupled with language models that rank sectors based on their predicted performances. Our results demonstrate 0.605 F1-score in sector classification, and 0.997 NDCG score in predicting ranks of sectors based on post-budget performances. The methodology enables investors and policymakers to quantify fiscal policy impacts through structured, data-driven insights, addressing critical gaps in manual analysis. The annotated dataset has been released under CC-BY-NC-SA-4.0 license to advance computational economics research.

BASIR: Budget-Assisted Sectoral Impact Ranking -- A Dataset for Sector Identification and Performance Prediction Using Language Models

TL;DR

BASIR introduces a dataset and framework forBudget-Assisted Sectoral Impact Ranking, enabling automatic identification of budget-text sectors and ranking their expected performance. By combining fine-tuned embeddings for sector identification with transformer-based ranking models, the approach achieves 0.605 F1 in sector classification and 0.997 NDCG in predicting sector performance post-budget. The dataset spans Indian Union Budgets from 1947 to 2025, comprising 1,600+ excerpts across 81 sectors and 400+ performance-labeled texts, with data splits spanning pre- and post-2020 periods to support predictive evaluation. This work offers a scalable, data-driven tool for investors and policymakers to quantify fiscal policy impacts and prioritizes releasing the annotated BASIR dataset under CC-BY-NC-SA-4.0 to foster further research.

Abstract

Government fiscal policies, particularly annual union budgets, exert significant influence on financial markets. However, real-time analysis of budgetary impacts on sector-specific equity performance remains methodologically challenging and largely unexplored. This study proposes a framework to systematically identify and rank sectors poised to benefit from India's Union Budget announcements. The framework addresses two core tasks: (1) multi-label classification of excerpts from budget transcripts into 81 predefined economic sectors, and (2) performance ranking of these sectors. Leveraging a comprehensive corpus of Indian Union Budget transcripts from 1947 to 2025, we introduce BASIR (Budget-Assisted Sectoral Impact Ranking), an annotated dataset mapping excerpts from budgetary transcripts to sectoral impacts. Our architecture incorporates fine-tuned embeddings for sector identification, coupled with language models that rank sectors based on their predicted performances. Our results demonstrate 0.605 F1-score in sector classification, and 0.997 NDCG score in predicting ranks of sectors based on post-budget performances. The methodology enables investors and policymakers to quantify fiscal policy impacts through structured, data-driven insights, addressing critical gaps in manual analysis. The annotated dataset has been released under CC-BY-NC-SA-4.0 license to advance computational economics research.

Paper Structure

This paper contains 14 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Identifying and Ranking sectors from transcripts of Indian Union Budgets