IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

Ayush Maheshwari; Kaushal Sharma; Vivek Patel; Aditya Maheshwari

IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

Ayush Maheshwari, Kaushal Sharma, Vivek Patel, Aditya Maheshwari

TL;DR

IndicParam addresses the critical gap in evaluating LLMs on low-resource and extremely low-resource Indic languages by providing a large, human-curated 13k+ question benchmark spanning 11 languages plus Sanskrit–English code-mixed content. It jointly assesses language understanding and general knowledge through LU/GK labeling and a diverse set of question types, enabling fine-grained cross-lingual analysis and prompting-design study. The study reveals that current frontier models, even GPT-5, achieve only ~45% average correctness and that LU tasks remain substantially harder than GK tasks, highlighting a need for richer Indic pretraining and targeted adaptation. The benchmark, data processing pipeline, and zero-shot prompting framework offer a robust resource for researchers and practitioners to diagnose and improve cross-lingual capabilities in underrepresented Indic languages.

Abstract

While large language models excel on high-resource multilingual tasks, low- and extremely low-resource Indic languages remain severely under-evaluated. We present IndicParam, a human-curated benchmark of over 13,000 multiple-choice questions covering 11 such languages (Nepali, Gujarati, Marathi, Odia as low-resource; Dogri, Maithili, Rajasthani, Sanskrit, Bodo, Santali, Konkani as extremely low-resource) plus Sanskrit-English code-mixed set. We evaluated 19 LLMs, both proprietary and open-weights, which reveals that even the top-performing GPT-5 reaches only 45.0% average accuracy, followed by DeepSeek-3.2 (43.1) and Claude-4.5 (42.7). We additionally label each question as knowledge-oriented or purely linguistic to discriminate factual recall from grammatical proficiency. Further, we assess the ability of LLMs to handle diverse question formats-such as list-based matching, assertion-reason pairs, and sequence ordering-alongside conventional multiple-choice questions. IndicParam provides insights into limitations of cross-lingual transfer and establishes a challenging benchmark for Indic languages. The dataset is available at https://huggingface.co/datasets/bharatgenai/IndicParam. Scripts to run benchmark are present at https://github.com/ayushbits/IndicParam.

IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

TL;DR

Abstract

IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)