BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference
Farah Binta Haque, Md Yasin, Shishir Saha, Md Shoaib Akhter Rafi, Farig Sadeque
TL;DR
BNLI tackles the lack of reliable Bengali NLI data by introducing a linguistically refined benchmark created through a three-stage pipeline that translates SNLI premises into Bengali, generates hypotheses via native speakers, and validates pairs with semantic similarity and expert review. The dataset consists of 23,067 high-quality premise–hypothesis pairs balanced across entailment, contradiction, and neutral labels (Entailment=7682, Contradiction=7696, Neutral=7661). Benchmarking across multilingual and Bengali-specific transformers shows transformer models outperform LSTM baselines, with LLaMA-2 achieving the highest F1 around 79%, highlighting cross-lingual capabilities. BNLI's public release aims to accelerate Bengali NLI research and support broader low-resource language inference work.
Abstract
Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity, which hinder effective model training and evaluation. To address these limitations, we introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling. The dataset was constructed through a rigorous annotation pipeline emphasizing semantic clarity and balance across entailment, contradiction, and neutrality classes. We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations in Bengali text. The experimental findings highlight the improved reliability and interpretability achieved with BNLI, establishing it as a strong foundation for advancing research in Bengali and other low-resource language inference tasks.
